Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 41 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 474 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

SubCharacter Chinese-English Neural Machine Translation with Wubi encoding (1911.02737v1)

Published 7 Nov 2019 in cs.CL

Abstract: Neural machine translation (NMT) is one of the best methods for understanding the differences in semantic rules between two languages. Especially for Indo-European languages, subword-level models have achieved impressive results. However, when the translation task involves Chinese, semantic granularity remains at the word and character level, so there is still need more fine-grained translation model of Chinese. In this paper, we introduce a simple and effective method for Chinese translation at the sub-character level. Our approach uses the Wubi method to translate Chinese into English; byte-pair encoding (BPE) is then applied. Our method for Chinese-English translation eliminates the need for a complicated word segmentation algorithm during preprocessing. Furthermore, our method allows for sub-character-level neural translation based on recurrent neural network (RNN) architecture, without preprocessing. The empirical results show that for Chinese-English translation tasks, our sub-character-level model has a comparable BLEU score to the subword model, despite having a much smaller vocabulary. Additionally, the small vocabulary is highly advantageous for NMT model compression.

Citations (4)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.