Can gestures revolutionize human-computer interaction? This paper explores the potential of freehand gestures combined with speech as a step toward natural, multimodal HCI, particularly in the context of a computerized map. It addresses the challenges of gesture interpretation in multimodal settings and proposes a method for semantic classification of gesture primitives. The authors formalize a method for bootstrapping the interpretation process by semantically classifying gesture primitives based on their spatio-temporal deixis. Results from user studies indicate that gesture primitives form co-occurrence patterns with speech parts, revealing two levels of gesture meaning: individual stroke and motion complex. These findings define a new approach to interpretation in natural gesture-speech interfaces, paving the way for more intuitive and effective human-computer interaction. The study provides a foundation for future research in multimodal HCI.
Published in the International Journal on Artificial Intelligence Tools, this work aligns with the journal's focus on innovative AI techniques for human-computer interaction. The study's exploration of gesture recognition and multimodal integration contributes to the development of more intuitive and user-friendly AI systems, which is a key theme in the journal.