Access the full text.
Sign up today, get DeepDyve free for 14 days.
Damien Teney, A. Hengel (2017)
Visual Question Answering as a Meta Learning TaskArXiv, abs/1711.08105
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Kuniaki Saito, Andrew Shin, Y. Ushiku, T. Harada (2016)
DualNet: Domain-invariant network for visual question answering2017 IEEE International Conference on Multimedia and Expo (ICME)
K. Simonyan, Andrew Zisserman (2014)
Very Deep Convolutional Networks for Large-Scale Image RecognitionCoRR, abs/1409.1556
Mateusz Malinowski, Carl Doersch, Adam Santoro, P. Battaglia (2018)
Learning Visual Question Answering by Bootstrapping Hard Attention
J. Elman (1990)
Finding Structure in TimeCogn. Sci., 14
Peng Wang, Qi Wu, Chunhua Shen, A. Dick, A. Hengel (2015)
Explicit Knowledge-based Reasoning for Visual Question AnsweringArXiv, abs/1511.02570
Huijuan Xu, Kate Saenko (2015)
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (2015)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence, 39
Kushal Kafle, Scott Cohen, Brian Price, Christopher Kanan (2018)
DVQA: Understanding Data Visualizations via Question Answering2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Yuke Zhu, Ce Zhang, Christopher Ré, Li Fei-Fei (2015)
Building a Large-scale Multimodal Knowledge Base System for Answering Visual QueriesarXiv: Computer Vision and Pattern Recognition
G. Miller, W. Charles (1991)
Contextual correlates of semantic similarityLanguage and Cognitive Processes, 6
Medhini Narasimhan, A. Schwing (2018)
Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question AnsweringArXiv, abs/1809.01124
Kushal Kafle, Christopher Kanan (2016)
Answer-Type Prediction for Visual Question Answering2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Qi Wu, Peng Wang, Chunhua Shen, A. Dick, A. Hengel (2015)
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh (2017)
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 DatasetArXiv, abs/1704.08243
Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, W. Xu, R. Nevatia (2015)
ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question AnsweringArXiv, abs/1511.05960
Mateusz Malinowski, Mario Fritz (2014)
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun (2015)
Deep Residual Learning for Image Recognition2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Dongchen Yu, Xing Gao, H. Xiong (2018)
Structured Semantic Representation for Visual Question Answering2018 25th IEEE International Conference on Image Processing (ICIP)
Eva Forsbom (2003)
Training a super model look-alike
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Janvin (2003)
A Neural Probabilistic Language ModelJ. Mach. Learn. Res., 3
Mengfei Li, Li Gu, Yi Ji, Chunping Liu (2018)
Text-Guided Dual-Branch Attention Network for Visual Question Answering
Akira Fukui, Dong Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach (2016)
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Stanislaw Antol, C. Zitnick, Devi Parikh (2014)
Zero-Shot Learning via Visual Abstraction
Michael Denkowski, A. Lavie (2014)
Meteor Universal: Language Specific Translation Evaluation for Any Target Language
Tom Young, Devamanyu Hazarika, Soujanya Poria, E. Cambria (2017)
Recent Trends in Deep Learning Based Natural Language ProcessingIEEE Comput. Intell. Mag., 13
Kushal Kafle, Christopher Kanan (2017)
An Analysis of Visual Question Answering Algorithms2017 IEEE International Conference on Computer Vision (ICCV)
Mengye Ren, Ryan Kiros, R. Zemel (2015)
Exploring Models and Data for Image Question Answering
(2018)
lecture notes in computer science
Hao Ren, Hong Lu (2018)
Compositional coding capsule network with k-means routing for text classificationPattern Recognit. Lett., 160
D. Lowe (1999)
Object recognition from local scale-invariant featuresProceedings of the Seventh IEEE International Conference on Computer Vision, 2
Liang Peng, Yang Yang, Yi Bin, Ning Xie, Fumin Shen, Yanli Ji, Xing Xu (2018)
Word-to-region attention network for visual question answeringMultimedia Tools and Applications, 78
Mengye Ren, Ryan Kiros, R. Zemel (2015)
Image Question Answering: A Visual Semantic Embedding Model and a New DatasetArXiv, abs/1505.02074
Qi Wu, Chunhua Shen, Peng Wang, A. Dick, A. Hengel (2016)
Image Captioning and Visual Question Answering Based on Attributes and External KnowledgeIEEE Transactions on Pattern Analysis and Machine Intelligence, 40
Omer Levy, Yoav Goldberg, Ido Dagan (2015)
Improving Distributional Similarity with Lessons Learned from Word EmbeddingsTransactions of the Association for Computational Linguistics, 3
Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov (2016)
Enriching Word Vectors with Subword InformationTransactions of the Association for Computational Linguistics, 5
Justin Johnson, B. Hariharan, L. Maaten, Li Fei-Fei, C. Zitnick, Ross Girshick (2016)
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang (2017)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Aniruddha Kembhavi, M. Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi (2016)
A Diagram is Worth a Dozen ImagesArXiv, abs/1603.07396
Liangfu Cao, Lianli Gao, Jingkuan Song, Xing Xu, Heng Shen (2017)
Jointly Learning Attentions with Semantic Cross-Modal Correlation for Visual Question Answering
D. Gurari, Qing Li, Abigale Stangl, Anhong Guo, Chi Lin, K. Grauman, Jiebo Luo, Jeffrey Bigham (2018)
VizWiz Grand Challenge: Answering Visual Questions from Blind People2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
A. Jabri, Armand Joulin, L. Maaten (2022)
Visual Question AnsweringInternational Journal of Advanced Research in Science, Communication and Technology
Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh (2016)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question AnsweringInternational Journal of Computer Vision, 127
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick (2017)
Mask R-CNN2017 IEEE International Conference on Computer Vision (ICCV)
Jin-Hwa Kim, Sang-Woo Lee, Donghyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang (2016)
Multimodal Residual Learning for Visual QA
Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, S. Hoi, Xiaogang Wang (2018)
Question-Guided Hybrid Convolution for Visual Question AnsweringArXiv, abs/1808.02632
Nelson Ruwa, Qi-rong Mao, Liangjun Wang, Ming Dong (2018)
Affective Visual Question Answering Network2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
Licheng Yu, Eunbyung Park, A. Berg, Tamara Berg (2015)
Visual Madlibs: Fill in the Blank Description Generation and Question Answering2015 IEEE International Conference on Computer Vision (ICCV)
Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh (2016)
Hierarchical Question-Image Co-Attention for Visual Question AnsweringArXiv, abs/1606.00061
W. Xu, A. Rudnicky (2000)
Can artificial neural networks learn language models?
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer (2018)
Deep Contextualized Word RepresentationsArXiv, abs/1802.05365
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, D. Klein (2015)
Neural Module Networks2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Lin Ma, Zhengdong Lu, Hang Li (2015)
Learning to Answer Questions from Image Using Convolutional Neural Network
Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, R. Fergus (2015)
Simple Baseline for Visual Question AnsweringArXiv, abs/1512.02167
Christian Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, Scott Reed, Dragomir Anguelov, D. Erhan, Vincent Vanhoucke, Andrew Rabinovich (2014)
Going deeper with convolutions2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
B. Prakash, K. Sanjeev, R. Prakash, K. Chandrasekaran (2018)
A Survey on Recurrent Neural Network Architectures for Sequential Learning
Wei Zhao, Haiyun Peng, Steffen Eger, E. Cambria, Min Yang (2019)
Towards Scalable and Reliable Capsule Networks for Challenging NLP ApplicationsArXiv, abs/1906.02829
Lajanugen Logeswaran, Honglak Lee (2018)
An efficient framework for learning sentence representationsArXiv, abs/1803.02893
A. Lascarides, Nicholas Asher, J. Oberlander (1994)
Proceedings of the 32nd annual meeting on Association for Computational LinguisticsThe Association for Computational Linguistics
Zhibiao Wu, Martha Palmer (1994)
Verb Semantics and Lexical Selection
Zichao Yang, Xiaodong He, Jianfeng Gao, L. Deng, Alex Smola (2015)
Stacked Attention Networks for Image Question Answering2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018)
Advances in neural information processing systems pp
Mateusz Malinowski, Marcus Rohrbach, Mario Fritz (2016)
Ask Your Neurons: A Deep Learning Approach to Visual Question AnsweringInternational Journal of Computer Vision, 125
Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, W. Xu (2015)
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question
Peng Wang, Qi Wu, Chunhua Shen, A. Dick, A. Hengel (2016)
FVQA: Fact-Based Visual Question AnsweringIEEE Transactions on Pattern Analysis and Machine Intelligence, 40
Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh (2019)
Cycle-Consistency for Robust Visual Question Answering2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
D. Yu, Jianlong Fu, Tao Mei, Y. Rui (2017)
Multi-level Attention Networks for Visual Question Answering2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Andeep Toor, H. Wechsler, M. Nappi (2019)
Question action relevance and editing for visual question answeringMultimedia Tools and Applications, 78
Xiaoyu Lin, Devi Parikh (2016)
Leveraging Visual Question Answering for Image-Caption RankingArXiv, abs/1605.01379
Hyeonwoo Noh, P. Seo, Bohyung Han (2015)
Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Yalong Bai, Jianlong Fu, T. Zhao, Tao Mei (2018)
Deep Attention Neural Tensor Network for Visual Question Answering
S. Hochreiter, J. Schmidhuber (1997)
Long Short-Term MemoryNeural Computation, 9
Navneet Dalal, B. Triggs (2005)
Histograms of oriented gradients for human detection2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1
Vasileios Lioutas, N. Passalis, A. Tefas (2018)
Explicit ensemble attention learning for improving visual question answeringPattern Recognit. Lett., 111
Ilya Sutskever, Oriol Vinyals, Quoc Le (2014)
Sequence to Sequence Learning with Neural NetworksArXiv, abs/1409.3215
Matthew Zeiler, R. Fergus (2013)
Visualizing and Understanding Convolutional NetworksArXiv, abs/1311.2901
Ryan Kiros, Yukun Zhu, R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, S. Fidler (2015)
Skip-Thought Vectors
Tomas Mikolov, Ilya Sutskever, Kai Chen, G. Corrado, J. Dean (2013)
Distributed Representations of Words and Phrases and their Compositionality
I. Chaturvedi, Ranjan Satapathy, Sandro Cavallari, E. Cambria (2019)
Fuzzy commonsense reasoning for multimodal sentiment analysisPattern Recognit. Lett., 125
Alessandro Moschitti, B. Pang, Walter Daelemans (2014)
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Kushal Kafle, Christopher Kanan (2016)
Visual question answering: Datasets, algorithms, and future challengesArXiv, abs/1610.01465
Zhiwei Fang, Jing Liu, Yong Li, Yanyuan Qiao, Hanqing Lu (2019)
Improving visual question answering using dropout and enhanced question encoderPattern Recognit., 90
S. Kahou, Adam Atkinson, Vincent Michalski, Ákos Kádár, A. Trischler, Yoshua Bengio (2017)
FigureQA: An Annotated Figure Dataset for Visual ReasoningArXiv, abs/1710.07300
Ranjay Krishna, Yuke Zhu, O. Groth, Justin Johnson, K. Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David Shamma, Michael Bernstein, Li Fei-Fei (2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image AnnotationsInternational Journal of Computer Vision, 123
R. Lienhart, Jochen Maydt (2002)
An extended set of Haar-like features for rapid object detectionProceedings. International Conference on Image Processing, 1
Kyunghyun Cho, Bart Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio (2014)
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
Sadid Hasan, Yuan Ling, Oladimeji Farri, Joey Liu, H. Müller, M. Lungren (2018)
Overview of ImageCLEF 2018 Medical Domain Visual Question Answering Task
P. Isabelle (2002)
Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Yu-Cheng Feng (2018)
Learning Capsule Networks with Images and Text
Jiasen Lu, Caiming Xiong, Devi Parikh, R. Socher (2016)
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Li-Chi Huang, K. Kulkarni, Anik Jha, Suhas Lohit, Suren Jayasuriya, P. Turaga (2018)
CS-VQA: Visual Question Answering with Compressively Sensed Images2018 25th IEEE International Conference on Image Processing (ICIP)
Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh (2015)
Yin and Yang: Balancing and Answering Binary Visual Questions2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, Antoine Bordes (2017)
Supervised Learning of Universal Sentence Representations from Natural Language Inference DataArXiv, abs/1705.02364
S. Sabour, Nicholas Frosst, Geoffrey Hinton (2017)
Dynamic Routing Between CapsulesArXiv, abs/1710.09829
Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, D. Tao (2017)
Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question AnsweringIEEE Transactions on Neural Networks and Learning Systems, 29
H. Ben-younes, Rémi Cadène, M. Cord, Nicolas Thome (2017)
MUTAN: Multimodal Tucker Fusion for Visual Question Answering2017 IEEE International Conference on Computer Vision (ICCV)
Yoon Kim (2014)
Convolutional Neural Networks for Sentence Classification
Ross Girshick, Jeff Donahue, Trevor Darrell, J. Malik (2013)
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation2014 IEEE Conference on Computer Vision and Pattern Recognition
C. Eckart, G. Young (1936)
The approximation of one matrix by another of lower rankPsychometrika, 1
Yunchao Gong, Qifa Ke, M. Isard, S. Lazebnik (2012)
A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their SemanticsInternational Journal of Computer Vision, 106
Robik Shrestha, Kushal Kafle, Christopher Kanan (2019)
Answer Them All! Toward Universal Visual Question Answering Models2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin (2017)
Attention is All you Need
Zhou Yu, Jun Yu, Yuhao Cui, D. Tao, Q. Tian (2019)
Deep Modular Co-Attention Networks for Visual Question Answering2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Mingrui Lao, Yanming Guo, Hui Wang, Xin Zhang (2018)
Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question AnsweringIEEE Access, 6
T. Tommasi, Arun Mallya, Bryan Plummer, S. Lazebnik, A. Berg, Tamara Berg (2016)
Combining Multiple Cues for Visual Madlibs Question AnsweringInternational Journal of Computer Vision, 127
Jeffrey Pennington, R. Socher, Christopher Manning (2014)
GloVe: Global Vectors for Word Representation
Kevin Shih, Saurabh Singh, Derek Hoiem (2015)
Where to Look: Focus Regions for Visual Question Answering2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, D. Klein (2015)
Deep Compositional Question Answering with Neural Module NetworksArXiv, abs/1511.02799
Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Zitnick, Devi Parikh, Dhruv Batra (2015)
VQA: Visual Question AnsweringInternational Journal of Computer Vision, 123
Ross Girshick (2015)
Fast R-CNN
Tomas Mikolov, Kai Chen, G. Corrado, J. Dean (2013)
Efficient Estimation of Word Representations in Vector Space
Yuke Zhu, O. Groth, Michael Bernstein, Li Fei-Fei (2015)
Visual7W: Grounded Question Answering in Images2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
A. Krizhevsky, Ilya Sutskever, Geoffrey Hinton (2012)
ImageNet classification with deep convolutional neural networksCommunications of the ACM, 60
Yang Shi, Tommaso Furlanello, Sheng Zha, Anima Anandkumar (2018)
Question Type Guided Attention in Visual Question Answering
K. Papineni, Salim Roukos, T. Ward, Wei-Jing Zhu (2002)
Bleu: a Method for Automatic Evaluation of Machine Translation
D. Geman, S. Geman, Neil Hallonquist, L. Younes (2015)
Visual Turing test for computer vision systemsProceedings of the National Academy of Sciences, 112
Visual question answering (VQA) is a task that has received immense consideration from two major research communities: computer vision and natural language processing. Recently it has been widely accepted as an AI-complete task which can be used as an alternative to visual turing test. In its most common form, it is a multi-modal challenging task where a computer is required to provide the correct answer for a natural language question asked about an input image. It attracts many deep learning researchers after their remarkable achievements in text, voice and vision technologies. This review extensively and critically examines the current status of VQA research in terms of step by step solution methodologies, datasets and evaluation metrics. Finally, this paper also discusses future research directions for all the above-mentioned aspects of VQA separately.
Artificial Intelligence Review – Springer Journals
Published: Dec 8, 2020
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.