Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Attention driven reference resolution in multimodal contexts

Attention driven reference resolution in multimodal contexts In recent years a number of psycholinguistic experiments have pointed to the interaction between language and vision. In particular, the interaction between visual attention and linguistic reference. In parallel with this, several theories of discourse have attempted to provide an account of the relationship between types of referential expressions on the one hand and the degree of mental activation on the other. Building on both of these traditions, this paper describes an attention based approach to visually situated reference resolution. The framework uses the relationship between referential form and preferred mode of interpretation as a basis for a weighted integration of linguistic and visual attention scores for each entity in the multimodal context. The resulting integrated attention scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context, in so far as the referent is selected from a full list of the objects in the multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Moreover, the system can recognise situations where attention cues from different modalities make a reference potentially ambiguous. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Artificial Intelligence Review Springer Journals

Attention driven reference resolution in multimodal contexts

Artificial Intelligence Review , Volume 25 (2) – Aug 25, 2007

Loading next page...
 
/lp/springer-journals/attention-driven-reference-resolution-in-multimodal-contexts-6KX5lFg8Lu

References (42)

Publisher
Springer Journals
Copyright
Copyright © 2007 by Springer Science+Business Media B.V.
Subject
Computer Science; Complexity; Computer Science, general ; Artificial Intelligence (incl. Robotics)
ISSN
0269-2821
eISSN
1573-7462
DOI
10.1007/s10462-007-9022-9
Publisher site
See Article on Publisher Site

Abstract

In recent years a number of psycholinguistic experiments have pointed to the interaction between language and vision. In particular, the interaction between visual attention and linguistic reference. In parallel with this, several theories of discourse have attempted to provide an account of the relationship between types of referential expressions on the one hand and the degree of mental activation on the other. Building on both of these traditions, this paper describes an attention based approach to visually situated reference resolution. The framework uses the relationship between referential form and preferred mode of interpretation as a basis for a weighted integration of linguistic and visual attention scores for each entity in the multimodal context. The resulting integrated attention scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context, in so far as the referent is selected from a full list of the objects in the multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Moreover, the system can recognise situations where attention cues from different modalities make a reference potentially ambiguous.

Journal

Artificial Intelligence ReviewSpringer Journals

Published: Aug 25, 2007

There are no references for this article.