Abstract
Multimodal interaction means computer operators can communicate naturally
and intuitively with the system by using modalities such as speech and gesture,
facilitating complex spatial tasks, such as air-traffic control. Measuring their
cognitive load in real-time allows the system to adapt to users affected by high
cognitive load, easing the demand and avoiding stress, frustration and errors. This
dissertation explores the viability of using features extracted from multimodal
interactive data as symptomatic cues of high cognitive load.
Two empirical user studies were conducted to collect multimodal interactive
data under levels of increasing load, in a traffic management scenario. A novel
framework to collect natural, unbiased multimodal input is presented, addressing
the requirements for designing multimodal tasks of varying complexity.
The first study uses a speech and manual gesture interface, and examines
changes in conceptual communicative structures, namely the pattern of semantic
redundancy and complementarity. The results confirm that people are more
semantically redundant when load is low; and more semantically complementary
during high load tasks. Consistent with modal models of working memory, people
manage high levels of load by diffusing communication across different modalities,
with the least duplication possible to effectively expand their available working
memory resources.
The second, longitudinal study used a pen-gesture and speech interface, and
examined changes to communication structures at the production level, correlating
the degree of modal degradation to cognitive load. The results show that
modal input degrades to a greater degree during high load tasks than during low
load tasks. The use of cognitive tools also increases as load increases, revealing
yet another type of index.
The feasibility of using multimodal interaction features as indices of cognitive
load is validated, future work should be geared toward assessing their sensitivity
and diagnostic value.