SignBank+: Preparing a Multilingual Sign Language Dataset for Machine Translation Using Large Language Models (2309.11566v2)
Abstract: We introduce SignBank+, a clean version of the SignBank dataset, optimized for machine translation between spoken language text and SignWriting, a phonetic sign language writing system. In addition to previous work that employs complex factorization techniques to enable translation between text and SignWriting, we show that a traditional text-to-text translation approach performs equally effectively on the cleaned SignBank+ dataset. Our evaluation results indicate that models trained on SignBank+ surpass those on the original dataset, establishing a new benchmark for SignWriting-based sign language translation and providing an open resource for future research.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.