2000 character limit reached
Using an LLM to Turn Sign Spottings into Spoken Language Sentences (2403.10434v2)
Published 15 Mar 2024 in cs.CV
Abstract: Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a hybrid SLT approach, Spotter+GPT, that utilizes a sign spotter and a powerful LLM to improve SLT performance. Spotter+GPT breaks down the SLT task into two stages. The videos are first processed by the Spotter, which is trained on a linguistic sign language dataset, to identify individual signs. These spotted signs are then passed to an LLM, which transforms them into coherent and contextually appropriate spoken language sentences. The source code of the Spotter is available at https://gitlab.surrey.ac.uk/cogvispublic/sign-spotter.
- Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 35–53. Springer, 2020.
- Bbc-oxford british sign language dataset. arXiv preprint arXiv:2111.03635, 2021.
- M. Bohacek and M. Hrúz. Learning from what is already out there: Few-shot sign language recognition with online dictionaries. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–6. IEEE, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Neural sign language translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- Sign language transformers: Joint end-to-end sign language recognition and translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10023–10033, 2020.
- J. Carreira and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
- A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia, 21(7):1880–1891, 2019.
- How2sign: a large-scale multimodal dataset for continuous american sign language. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2735–2744, 2021.
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pages 369–376, 2006.
- Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3413–3423, 2021.
- H. R. V. Joze and O. Koller. Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053, 2018.
- Meine dgs–annotiert.öffentliches korpus der deutschen gebärdensprache, 3. release / my dgs – annotated. public corpus of german sign language, 3rd release. 2020.
- Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1459–1469, 2020.
- Linguistically motivated evaluation of the 2023 state-of-the-art machine translation: Can chatgpt outperform nmt? In Proceedings of the Eighth Conference on Machine Translation, pages 224–245, 2023.
- Findings of the first wmt shared task on sign language translation (wmt-slt22). In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 744–772, 2022.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
- Towards making the most of chatgpt for machine translation. Available at SSRN 4390455, 2023.
- M. Post. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771, 2018.
- Iterative alignment network for continuous sign language recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4165–4174, 2019.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476, Accepted by Empirical Methods in Natural Language Processing (EMNLP) 2023., 2023.
- Improving language understanding by generative pre-training. 2018.
- Signing at scale: Learning to co-articulate signs for large-scale photo-realistic sign language production. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5141–5151, 2022.
- Bleurt: Learning robust metrics for text generation. arXiv preprint arXiv:2004.04696, 2020.
- N. Shahin and L. Ismail. Chatgpt, let us chat sign language: Experiments, architectural elements, challenges and research directions. In 2023 International Symposium on Networks, Computers and Communications (ISNCC), pages 1–7. IEEE, 2023.
- Is context all you need? scaling neural sign language translation to large domains of discourse. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1955–1965, 2023.
- Autsl: A large scale multi-modal turkish sign language dataset and baseline methods. IEEE Access, 8:181340–181355, 2020.
- On the importance of initialization and momentum in deep learning. In International conference on machine learning, pages 1139–1147. PMLR, 2013.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3462–3471, 2021.
- F. Wei and Y. Chen. Improving continuous sign language recognition with cross-lingual signs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23612–23621, 2023.
- Hierarchical i3d for sign spotting. In European Conference on Computer Vision, pages 243–255. Springer, 2022.
- Sign language translation with iterative prototype. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15592–15601, 2023.
- Sltunet: A simple unified model for sign language translation. In The Eleventh International Conference on Learning Representations (ICLR), 2022.
- Cvt-slr: Contrastive visual-textual transformation for sign language recognition with variational alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23141–23150, 2023.
- Improving sign language translation with monolingual data by sign back-translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1316–1325, June 2021.