Papers
Topics
Authors
Recent
2000 character limit reached

Textually Enriched Neural Module Networks for Visual Question Answering (1809.08697v1)

Published 23 Sep 2018 in cs.CL and cs.CV

Abstract: Problems at the intersection of language and vision, like visual question answering, have recently been gaining a lot of attention in the field of multi-modal machine learning as computer vision research moves beyond traditional recognition tasks. There has been recent success in visual question answering using deep neural network models which use the linguistic structure of the questions to dynamically instantiate network layouts. In the process of converting the question to a network layout, the question is simplified, which results in loss of information in the model. In this paper, we enrich the image information with textual data using image captions and external knowledge bases to generate more coherent answers. We achieve 57.1% overall accuracy on the test-dev open-ended questions from the visual question answering (VQA 1.0) real image dataset.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.