Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 43 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology (2305.12720v1)

Published 22 May 2023 in cs.CL and cs.AI

Abstract: This study constructed a Japanese chat dataset for tuning LLMs, which consist of about 8.4 million records. Recently, LLMs have been developed and gaining popularity. However, high-performing LLMs are usually mainly for English. There are two ways to support languages other than English by those LLMs: constructing LLMs from scratch or tuning existing models. However, in both ways, datasets are necessary parts. In this study, we focused on supporting Japanese in those LLMs and making a dataset for training or tuning LLMs in Japanese. The dataset we constructed consisted of various tasks, such as translation and knowledge tasks. In our experiment, we tuned an existing LLM using our dataset and evaluated the performance qualitatively. The results suggest that our dataset is possibly beneficial for LLMs. However, we also revealed some difficulties in constructing LLMs in languages other than English.

Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com