Qwen Technical Report (2309.16609v1)

Published 28 Sep 2023 in cs.CL

Abstract: LLMs have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first instaLLMent of our LLM series. Qwen is a comprehensive LLM series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained LLMs, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base LLMs consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base LLMs. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.

Citations (1,109)

View on Semantic Scholar

Summary

The paper presents a series of large language models that leverage extensive pretraining and alignment techniques to push NLP performance boundaries.
It details a modified Transformer architecture with untied embeddings, RoPE positional embeddings in FP32, and Flash Attention for efficient training.
Specialized models for coding and math demonstrate significant benchmark improvements, enhancing tasks in programming and mathematical reasoning.

Qwen Technical Report: An Overview

The "Qwen Technical Report" presents an extensive paper on Qwen, a sophisticated series of LLMs developed by the Qwen team at Alibaba Group. This series includes several distinct models with varying parameter counts and specialized applications, aiming at pushing the boundaries of NLP capabilities.

The Qwen series encompasses several key models: the base pretrained LLMs (Qwen), the chat models fine-tuned with human alignment techniques (Qwen-Chat), and specialized models targeting specific domains like coding (Code-Qwen, Code-Qwen-Chat) and mathematical reasoning (Math-Qwen-Chat). This wide spectrum of models allows the Qwen series to address a plethora of downstream tasks effectively.

Model Structure and Training

The base Qwen models undergo extensive pretraining on a diverse dataset encompassing trillions of tokens from various domains and languages. These models utilize a modified Transformer architecture, incorporating several design enhancements, such as untied embeddings, RoPE positional embeddings with FP32 precision, and SwiGLU activation functions. For optimization, the models leverage the AdamW optimizer and are trained using Flash Attention to handle large context lengths efficiently.

The Qwen models demonstrate superior performance on benchmark datasets when compared to other models of similar scale. Qwen-14B shows exceptional results, significantly outperforming previous state-of-the-art models with 13 billion parameters across various tasks.

Alignment Techniques: SFT and RLHF

Post-pretraining, the Qwen-Chat models are finetuned using Supervised Finetuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methods. The alignment process addresses the common pitfalls of LLMs like hallucination and non-alignment with human preferences. The RLHF models, in particular, are highly competitive, demonstrating capabilities that approach the performance of proprietary models like GPT-4 on several benchmarks.

Human evaluation confirms that the RLHF-trained Qwen-Chat models generate more human-preferred responses compared to their SFT counterparts. This demonstrates the efficacy of RLHF in enhancing model alignment with human expectations.

Specialized Models: Coding and Mathematics

The Qwen series also includes specialized models designed for coding and mathematical reasoning, namely Code-Qwen and Math-Qwen series. These models are built upon the base Qwen models and undergo continued pretraining and domain-specific finetuning.

Code-Qwen: These models excel in generative coding tasks, showcasing significant improvements on benchmarks like HumanEval and MBPP compared to open-source alternatives. The specialized fine-tuning enhances the model's capability to understand code, debug, and generate accurate solutions to programming tasks.

Math-Qwen: Focused on mathematical reasoning, these models achieve impressive accuracy on datasets like GSM8K and MATH, surpassing many open-source models and approaching the performance of proprietary models like Minerva and GPT-3.5. By addressing arithmetic and complex mathematical problems, Math-Qwen demonstrates the potential to assist in education and problem-solving tasks in scientific domains.

Practical Applications and Future Work

The practical implications of the Qwen series are vast. By providing open access to these models, the Qwen team fosters innovation and collaboration within the research community. The models' versatility in chat-based applications, coding, and mathematical problem-solving opens up new avenues for deploying AI in diverse fields such as education, software development, healthcare, and finance.

The future development of AI could see further enhancements in model architecture, training scalability, and alignment techniques. Researchers can build upon the Qwen models to explore novel applications or improve the existing frameworks, pushing the NLP frontier even further.

In conclusion, the Qwen series represents a significant advancement in the development of LLMs. With their robust architecture, sophisticated alignment techniques, and specialized capabilities, these models set a high benchmark for future research and practical applications in NLP. The open-sourcing of these models is a promising step towards a collaborative and innovative future in artificial intelligence.

Related Papers

Tweets

https://twitter.com/StphTphsn1/status/1883180463243817064

https://twitter.com/huybery/status/1748998605594312767

https://twitter.com/steeve/status/1819780513332187340

https://twitter.com/fastml_extra/status/1754640319621054702

https://twitter.com/jai_mantravadi/status/1755024941492187220

YouTube

Show All Videos