Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Grafting Pre-trained Models for Multimodal Headline Generation (2211.07210v1)

Published 14 Nov 2022 in cs.CV and cs.AI

Abstract: Multimodal headline utilizes both video frames and transcripts to generate the natural language title of the videos. Due to a lack of large-scale, manually annotated data, the task of annotating grounded headlines for video is labor intensive and impractical. Previous researches on pre-trained LLMs and video-LLMs have achieved significant progress in related downstream tasks. However, none of them can be directly applied to multimodal headline architecture where we need both multimodal encoder and sentence decoder. A major challenge in simply gluing LLM and video-LLM is the modality balance, which is aimed at combining visual-language complementary abilities. In this paper, we propose a novel approach to graft the video encoder from the pre-trained video-LLM on the generative pre-trained LLM. We also present a consensus fusion mechanism for the integration of different components, via inter/intra modality relation. Empirically, experiments show that the grafted model achieves strong results on a brand-new dataset collected from real-world applications.

Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.