Emergent Mind

Abstract

As LLMs become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods' applications and encourage them to adapt existing techniques to the fast-growing field of LLMs.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Multimodal biomedical ai. Nature Medicine, 28(9):1773–1784.
  2. Explanations for CommonsenseQA: New Dataset and Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065, Online. Association for Computational Linguistics.
  3. CM3: A Causal Masked Multimodal Model of the Internet
  4. Jointly Training Large Autoregressive Multimodal Models
  5. Characterizing attribution and fluency tradeoffs for retrieval-augmented large language models
  6. Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems.
  7. Flamingo: a Visual Language Model for Few-Shot Learning
  8. Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering
  9. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2):423–443.
  10. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  11. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  12. Skeleton-to-response: Dialogue generation guided by retrieval memory. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1219–1228, Minneapolis, Minnesota. Association for Computational Linguistics.
  13. Retrieval-guided dialogue response generation via a matching-to-generation framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1866–1875, Hong Kong, China. Association for Computational Linguistics.
  14. Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
  15. AirConcierge: Generating task-oriented dialogue via efficient large-scale knowledge retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 884–897, Online. Association for Computational Linguistics.
  16. Evaluating Large Language Models Trained on Code
  17. MuRAG: Multimodal retrieval-augmented generator for open question answering over images and text. In EMNLP, pages 5558–5570. ACL.
  18. MuRAG: Multimodal retrieval-augmented generator for open question answering over images and text. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5558–5570, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  19. Re-Imagen: Retrieval-Augmented Text-to-Image Generator
  20. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
  21. ENT-DESC: Entity description generation by exploring knowledge graph. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1187–1197, Online. Association for Computational Linguistics.
  22. Binding language models in symbolic languages. ICLR.
  23. Fine-grained image captioning with CLIP reward. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 517–527, Seattle, United States. Association for Computational Linguistics.
  24. PaLM: Scaling Language Modeling with Pathways
  25. Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy
  26. Vqgan-clip: Open domain image generation and editing with natural language guidance. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pages 88–105. Springer.
  27. Seyed Omid Davoudi and Majid Komeili. 2021. Toward faithful case-based reasoning through learning prototypes in a nearest neighbor-friendly space. In International Conference on Learning Representations.
  28. CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
  29. Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
  30. PaLM-E: An Embodied Multimodal Language Model
  31. Neural path hunter: Reducing hallucination in dialogue systems via path grounding. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2197–2214, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  32. A feature-space multimodal data augmentation technique for text-video retrieval. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4385–4394.
  33. Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs
  34. Augmenting transformers with KNN-based composite memory for dialog. Transactions of the Association for Computational Linguistics, 9:82–99.
  35. Qingkai Fang and Yang Feng. 2022. Neural machine translation with phrase-level universal visual representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5687–5698, Dublin, Ireland. Association for Computational Linguistics.
  36. BioReader: a retrieval-enhanced text-to-text transformer for biomedical literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5770–5793, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  37. Space efficient context encoding for non-task-oriented dialogue generation with graph attention transformer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7028–7041, Online. Association for Computational Linguistics.
  38. A survey on deep learning for multimodal data fusion. Neural Computation, 32(5):829–864.
  39. Luyu Gao and Jamie Callan. 2021. Condenser: a pre-training architecture for dense retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 981–993, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  40. PAL: Program-aided Language Models
  41. ComFact: A benchmark for linking contextual commonsense knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1656–1675, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  42. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  43. Filtering before iteratively referring for knowledge-grounded response selection in retrieval-based chatbots. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1412–1422, Online. Association for Computational Linguistics.
  44. Search engine guided neural machine translation. In AAAI, volume 32.
  45. Extract, transform and filling: A pipeline model for question paraphrasing based on template. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 109–114, Hong Kong, China. Association for Computational Linguistics.
  46. KAT: A knowledge augmented transformer for vision-and-language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 956–968, Seattle, United States. Association for Computational Linguistics.
  47. Unixcoder: Unified cross-modal pre-training for code representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 7212–7225. Association for Computational Linguistics.
  48. Retrieve and re-rank: A simple and effective IR approach to simple question answering over knowledge graphs. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 22–27, Brussels, Belgium. Association for Computational Linguistics.
  49. Realm: Retrieval-augmented language model pre-training
  50. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  51. A retrieve-and-edit framework for predicting structured outputs. NeurIPS, 31.
  52. Retrieval-based neural code generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 925–930, Brussels, Belgium. Association for Computational Linguistics.
  53. Rethinking with Retrieval: Faithful Large Language Model Inference
  54. Fast and accurate neural machine translation with translation memory. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3170–3180.
  55. ALCAP: Alignment-Augmented Music Captioner
  56. Audio-text retrieval based on contrastive learning and collaborative attention mechanism
  57. Logical form generation via multi-task learning for complex question answering over knowledge bases. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1687–1696, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  58. Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23369–23379.
  59. Empowering language models with knowledge graph reasoning for open-domain question answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9562–9581, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  60. Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
  61. Language Is Not All You Need: Aligning Perception with Language Models
  62. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880, Online. Association for Computational Linguistics.
  63. Atlas: Few-shot learning with retrieval augmented language models. arXiv
  64. Peter Jansen and Dmitry Ustalov. 2019. TextGraphs 2019 shared task on multi-hop inference for explanation regeneration. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pages 63–77, Hong Kong. Association for Computational Linguistics.
  65. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR.
  66. Shaping program repair space with existing patches and similar code. In ISSTA, pages 298–309. ACM.
  67. Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks
  68. InferFix: End-to-End Program Repair with LLMs
  69. Knowledge-enhanced evidence retrieval for counterargument generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3074–3094, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  70. Repair Is Nearly Generation: Multilingual Program Repair with LLMs
  71. Prompting visual-language models for efficient video understanding. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXV, pages 105–124. Springer.
  72. AttnIO: Knowledge Graph Exploration with In-and-Out Attention Flow for Knowledge-Grounded Dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3484–3497, Online. Association for Computational Linguistics.
  73. Knowledge graph-augmented language models for knowledge-grounded dialogue generation
  74. Dense Passage Retrieval for Open-Domain Question Answering
  75. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  76. Prefix tuning for automated audio captioning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
  77. Audio retrieval with natural language queries: A benchmark study. IEEE Transactions on Multimedia.
  78. Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
  79. Vgnmn: Video-grounded neural module networks for video-grounded dialogue systems. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3377–3393.
  80. BiST: Bi-directional spatio-temporal reasoning for video-grounded dialogues. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1846–1859, Online. Association for Computational Linguistics.
  81. Learning dense representations of phrases at scale. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6634–6647, Online. Association for Computational Linguistics.
  82. Constructing multi-modal dialogue dataset by replacing text with semantically relevant images. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 897–906, Online. Association for Computational Linguistics.
  83. Less is more: Clipbert for video-and-language learning via sparse sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7331–7341.
  84. TVQA+: Spatio-temporal grounding for video question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8211–8225, Online. Association for Computational Linguistics.
  85. Mining of Massive Datasets, 2nd Ed. Cambridge University Press.
  86. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  87. Knowledge-driven encode, retrieve, paraphrase for medical image report generation
  88. A Survey on Retrieval-Augmented Text Generation
  89. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
  90. OPERA: Harmonizing Task-Oriented Dialogs and Information Seeking Experience
  91. Chain of knowledge: A framework for grounding large language models with structured knowledge bases
  92. Knowledge-grounded dialogue generation with a unified knowledge representation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 206–218, Seattle, United States. Association for Computational Linguistics.
  93. Automating code review activities by large-scale pre-training. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, pages 1035–1047. ACM.
  94. Maria: A visual experience powered conversational agent. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5596–5611, Online. Association for Computational Linguistics.
  95. Weizhe Lin and Bill Byrne. 2022a. Retrieval augmented visual question answering with outside knowledge. In EMNLP, pages 11238–11254. Association for Computational Linguistics.
  96. Weizhe Lin and Bill Byrne. 2022b. Retrieval augmented visual question answering with outside knowledge. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11238–11254, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  97. FVQA 2.0: Introducing adversarial samples into fact-based visual question answering. In Findings of the Association for Computational Linguistics: EACL 2023, pages 149–157, Dubrovnik, Croatia. Association for Computational Linguistics.
  98. Relational memory-augmented language models. Transactions of the Association for Computational Linguistics, 10:555–572.
  99. Retrieval-augmented generation for code summarization via hybrid GNN. In ICLR.
  100. Combining relevance language modeling and clarity measure for extractive speech summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(6):957–969.
  101. Xuliang Liu and Hao Zhong. 2018. Mining stackoverflow for program repair. In SANER, pages 118–129. IEEE Computer Society.
  102. Uni-parser: Unified semantic parser for question answering on knowledge base and database. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8858–8869, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  103. Video captioning with multi-faceted attention. Transactions of the Association for Computational Linguistics, 6:173–184.
  104. Audio-text retrieval in context. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4793–4797. IEEE.
  105. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
  106. ReACC: A retrieval-augmented code completion framework. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6227–6240, Dublin, Ireland. Association for Computational Linguistics.
  107. The StatCan dialogue dataset: Retrieving data tables through conversations with genuine intents. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2799–2829, Dubrovnik, Croatia. Association for Computational Linguistics.
  108. Faithful Chain-of-Thought Reasoning
  109. Open-domain question answering via chain of reasoning over heterogeneous knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5360–5374, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  110. Language Models of Code are Few-Shot Commonsense Learners
  111. Muscaps: Generating captions for music audio. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
  112. Do the fix ingredients already exist? an empirical inquiry into the redundancy assumptions of program repair approaches. Companion Proceedings of the 36th International Conference on Software Engineering.
  113. Steve McConnell. 2004. Code complete. Pearson Education.
  114. Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates. In Findings of the Association for Computational Linguistics: EACL 2023, pages 274–288, Dubrovnik, Croatia. Association for Computational Linguistics.
  115. Augmented Language Models: a Survey
  116. Ambient search: A document retrieval system for speech streams. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2082–2091, Osaka, Japan. The COLING 2016 Organizing Committee.
  117. Demonstrating ambient search: Implicit document retrieval for speech streams. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pages 233–237, Osaka, Japan. The COLING 2016 Organizing Committee.
  118. A case study of NLG from multimedia data sources: Generating architectural landmark descriptions. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pages 2–14, Dublin, Ireland (Virtual). Association for Computational Linguistics.
  119. Learning joint embedding with multimodal cues for cross-modal video-text retrieval. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pages 19–27.
  120. Efficient text-based reinforcement learning by jointly leveraging state and commonsense graph representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 719–725, Online. Association for Computational Linguistics.
  121. Attention bottlenecks for multimodal fusion. Advances in Neural Information Processing Systems, 34:14200–14213.
  122. HybriDialogue: An information-seeking dialogue dataset grounded on tabular and textual data. In Findings of the Association for Computational Linguistics: ACL 2022, pages 481–492, Dublin, Ireland. Association for Computational Linguistics.
  123. WebGPT: Browser-assisted question-answering with human feedback
  124. FeTaQA: Free-form table question answering. Transactions of the Association for Computational Linguistics, 10:35–49.
  125. Retrieval-based prompt selection for code-related few-shot learning. In Proceedings of the 45th International Conference on Software Engineering (ICSE’23).
  126. Entailment tree explanations via iterative retrieval-generation reasoner. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 465–475, Seattle, United States. Association for Computational Linguistics.
  127. OpenAI. 2023. Gpt-4 technical report.
  128. Training language models to follow instructions with human feedback
  129. CLTR: An end-to-end, transformer-based system for cell-level table retrieval and table question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 202–209, Online. Association for Computational Linguistics.
  130. Retrieval augmented code generation and summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2719–2734, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  131. Game-Based Video-Context Dialogue
  132. Are NLP Models really able to Solve Simple Math Word Problems?
  133. Text generation with exemplar-based adaptive decoding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2555–2565, Minneapolis, Minnesota. Association for Computational Linguistics.
  134. DreamFusion: Text-to-3D using 2D Diffusion
  135. UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text
  136. The strength of random search on automated program repair. In ICSE, pages 254–265. ACM.
  137. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR.
  138. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  139. Zero-shot text-to-image generation. In ICML, volume 139 of Proceedings of Machine Learning Research, pages 8821–8831. PMLR.
  140. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR.
  141. Worldly wise (WoW) - cross-lingual knowledge fusion for fact-based visual spoken-question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1908–1919, Online. Association for Computational Linguistics.
  142. Retrieval-augmented image captioning. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3666–3681, Dubrovnik, Croatia. Association for Computational Linguistics.
  143. Deep composer: Deep neural hashing and retrieval approach to automatic music generation. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE.
  144. Retrieval-augmented transformer for image captioning. In CBMI, pages 1–7. ACM.
  145. Toolformer: Language Models Can Teach Themselves to Use Tools
  146. Simple entity-centric questions challenge dense retrievers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6138–6148, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  147. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, pages 618–626. IEEE Computer Society.
  148. Text is not enough: Integrating visual impressions into open-domain dialogue generation. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 4287–4296, New York, NY, USA. Association for Computing Machinery.
  149. RACE: Retrieval-augmented commit message generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5520–5530, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  150. Retrieval, analogy, and composition: A framework for compositional generalization in image captioning. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1990–2000, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  151. TIARA: Multi-grained retrieval for robust question answering over large knowledge base. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8108–8121, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  152. Few-shot table-to-text generation with prototype memory. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 910–917, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  153. Videobert: A joint model for video and language representation learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7464–7473.
  154. D2S: Document-to-slide generation via query-based text summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1405–1418, Online. Association for Computational Linguistics.
  155. TegTok: Augmenting text generation via task-specific and open-world knowledge. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1597–1609, Dublin, Ireland. Association for Computational Linguistics.
  156. LaMDA: Language Models for Dialog Applications
  157. Plug-and-play VQA: Zero-shot VQA by conjoining large pretrained models with zero training. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 951–967, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  158. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
  159. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2019, page 6558. NIH Public Access.
  160. T2VLAD: global-local sequence alignment for text-video retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 5079–5088. Computer Vision Foundation / IEEE.
  161. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  162. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
  163. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  164. The isabelle framework. In TPHOLs, volume 5170 of LNCS, pages 33–38. Springer.
  165. Retrieve and refine: Improved sequence generation models for dialogue. In Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 87–92, Brussels, Belgium. Association for Computational Linguistics.
  166. Sorting and transforming program repair ingredients via deep learning code similarities. 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 479–490.
  167. Incorporating background knowledge into video description generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3992–4001, Brussels, Belgium. Association for Computational Linguistics.
  168. Improving knowledge-aware dialogue response generation by using human-written prototype dialogues. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1402–1411, Online. Association for Computational Linguistics.
  169. Diverse and informative dialogue generation with context-specific commonsense knowledge awareness. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5811–5820, Online. Association for Computational Linguistics.
  170. DeltaNet: Conditional medical report generation for COVID-19 diagnosis. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2952–2961, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  171. Response generation by context-aware prototype editing. In AAAI, volume 33, pages 7281–7288.
  172. Autoformalization with large language models. In NeurIPS.
  173. What do developers search for on the web? Empir. Softw. Eng., 22(6):3149–3185.
  174. Enhancing question generation with commonsense knowledge. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 976–987, Huhhot, China. Chinese Information Processing Society of China.
  175. Boosting neural machine translation with similar translations. In Annual Meeting of the Association for Computational Linguistics, pages 1570–1579. Association for Computational Linguistics.
  176. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI, volume 29.
  177. Fusing context into knowledge graph for commonsense question answering. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1201–1207, Online. Association for Computational Linguistics.
  178. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014, pages 590–604. IEEE Computer Society.
  179. Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
  180. Writing by memorizing: Hierarchical retrieval-based medical report generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5000–5009, Online. Association for Computational Linguistics.
  181. Z-LaVI: Zero-shot language solver fueled by visual imagination. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1186–1203, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  182. An empirical study of gpt-3 for few-shot knowledge-based vqa. In AAAI, volume 36, pages 3081–3089.
  183. LogicSolver: Towards interpretable math word problem solving with logical prompt-enhanced learning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1–13, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  184. Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
  185. Retrieval-Augmented Multimodal Language Modeling
  186. The unreliability of explanations in few-shot prompting for textual reasoning. In Advances in Neural Information Processing Systems.
  187. Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning
  188. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
  189. Xiaojing Yu and Anxiao Jiang. 2021. Expanding, retrieving and infilling: Diversifying cross-domain question generation with flexible templates. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3202–3212, Online. Association for Computational Linguistics.
  190. RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training
  191. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
  192. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
  193. Retrieval-based neural source code summarization. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE ’20, page 1385–1397, New York, NY, USA. Association for Computing Machinery.
  194. Guiding neural machine translation with retrieved translation pieces. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1325–1335, New Orleans, Louisiana. Association for Computational Linguistics.
  195. KERS: A knowledge-enhanced framework for recommendation dialog systems with multiple subgoals. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1092–1101, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  196. Multimodal Chain-of-Thought Reasoning in Language Models
  197. Generating synthetic speech from SpokenVocab for speech translation. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1975–1981, Dubrovnik, Croatia. Association for Computational Linguistics.
  198. Can chatgpt-like generative models guarantee factual accuracy? on the mistakes of new generation search engines
  199. Verify-and-edit: A knowledge-enhanced chain-of-thought framework
  200. Unified vision-language pre-training for image captioning and vqa. In AAAI, volume 34, pages 13041–13049.
  201. Focus! relevant and sufficient context selection for news image captioning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6078–6088, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  202. DocPrompting: Generating Code by Retrieving the Docs
  203. Yucheng Zhou and Guodong Long. 2023. Style-aware contrastive learning for multi-style image captioning. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2257–2267, Dubrovnik, Croatia. Association for Computational Linguistics.
  204. Modeling graph structure in transformer for better AMR-to-text generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5459–5468, Hong Kong, China. Association for Computational Linguistics.
  205. Visualize before you write: Imagination-guided open-ended text generation. In Findings of the Association for Computational Linguistics: EACL 2023, pages 78–92, Dubrovnik, Croatia. Association for Computational Linguistics.

Show All 205