Papers
Topics
Authors
Recent
2000 character limit reached

TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering

Published 16 Jul 2024 in cs.CV and cs.AI | (2407.11383v1)

Abstract: In healthcare and medical diagnostics, Visual Question Answering (VQA) mayemergeasapivotal tool in scenarios where analysis of intricate medical images becomes critical for accurate diagnoses. Current text-based VQA systems limit their utility in scenarios where hands-free interaction and accessibility are crucial while performing tasks. A speech-based VQA system may provide a better means of interaction where information can be accessed while performing tasks simultaneously. To this end, this work implements a speech-based VQA system by introducing a Textless Multilingual Pathological VQA (TMPathVQA) dataset, an expansion of the PathVQA dataset, containing spoken questions in English, German & French. This dataset comprises 98,397 multilingual spoken questions and answers based on 5,004 pathological images along with 70 hours of audio. Finally, this work benchmarks and compares TMPathVQA systems implemented using various combinations of acoustic and visual features.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.