Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Out-of-Domain data help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection? (2311.16496v4)

Published 27 Nov 2023 in cs.LG

Abstract: Spread of fake news using out-of-context images and captions has become widespread in this era of information overload. Since fake news can belong to different domains like politics, sports, etc. with their unique characteristics, inference on a test image-caption pair is contingent on how well the model has been trained on similar data. Since training individual models for each domain is not practical, we propose a novel framework termed DPOD (Domain-specific Prompt tuning using Out-of-domain data), which can exploit out-of-domain data during training to improve fake news detection of all desired domains simultaneously. First, to compute generalizable features, we modify the Vision-LLM, CLIP to extract features that helps to align the representations of the images and corresponding captions of both the in-domain and out-of-domain data in a label-aware manner. Further, we propose a domain-specific prompt learning technique which leverages training samples of all the available domains based on the extent they can be useful to the desired domain. Extensive experiments on the large-scale NewsCLIPpings and VERITE benchmarks demonstrate that DPOD achieves state of-the-art performance for this challenging task. Code: https://github.com/scviab/DPOD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Open-domain, content-based, multi-modal fact-checking of out-of-context images via online resources. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, pages 14920–14929, 2022.
  2. Cosmos: Catching out-of-context image misuse using self-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 14084–14092, 2023.
  3. a-la-carte prompt tuning (apt): Combining distinct data via composable prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14984–14993, 2023.
  4. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, pages 1597–1607, 2020.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Poda: Prompt-driven zero-shot domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18623–18633, 2023.
  7. Clip-adapter: Better vision-language models with feature adapters. CoRR, abs/2110.04544, 2021.
  8. Domain adaptation via prompt learning. arXiv preprint arXiv:2202.06687, 2022.
  9. Switchprompt: Learning domain-specific gated soft prompts for classification in low-resource domains. arXiv preprint arXiv:2302.06868, 2023.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  11. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
  12. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023.
  13. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  14. Visual news: Benchmark and challenges in news image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pages 6761–6771. ACL, 2021.
  15. Newsclippings: Automatic generation of out-of-context multimodal media. CoRR, abs/2104.05893, 2021.
  16. Self-supervised distilled learning for multi-modal misinformation identification. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, pages 2818–2827, 2023.
  17. Improving fake news detection of influential domain via domain- and instance-level transfer. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2834–2848, Gyeongju, Republic of Korea, 2022. International Committee on Computational Linguistics.
  18. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021. PMLR, 2021.
  19. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pages 557–565. AAAI Press, 2021.
  20. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  21. Ad-clip: Adapting domains in prompt space using clip. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4355–4364, 2023.
  22. Spotfake: A multi-modal framework for fake news detection. In Fifth IEEE International Conference on Multimedia Big Data, BigMM 2019, pages 39–47. IEEE, 2019.
  23. EANN: event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pages 849–857. ACM, 2018.
  24. Multimodal emergent fake news detection via meta neural process networks. In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3708–3716. ACM, 2021.
  25. Learning domain invariant prompt for vision-language models. arXiv preprint arXiv:2212.04196, 2022.
  26. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022a.
  27. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
  28. SAFE: similarity-aware multi-modal fake news detection. In Pacific-Asia Conference on knowledge discovery and data mining, pages 354–367. Springer, 2020.
  29. Multimodal fake news detection via clip-guided learning. CoRR, abs/2205.14304, 2022c.

Summary

We haven't generated a summary for this paper yet.