Emergent Mind

Textbooks Are All You Need II: phi-1.5 technical report

(2309.05463)
Published Sep 11, 2023 in cs.CL and cs.AI

Abstract

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing LLMs to generate textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow theTextbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.

Benchmark results compare phi-1pointfive, its web-enhanced version, and other leading open-source LLMs.

Overview

  • Phi-1.5 is a language model with 1.3 billion parameters, trained on high-quality synthetic data for improved language understanding and efficient computation.

  • The model performs comparably to larger models in common sense reasoning and coding tasks, while producing less toxic and biased output.

  • The training of phi-1.5 incorporates an iterative fine-tuning process with a focus on data quality over quantity, making the training dataset significantly smaller than typical datasets.

  • As an open-source model, phi-1.5 offers potential for more accessible and energy-efficient AI research, challenging the convention of larger models for advanced capabilities.

  • Phi-1.5 addresses AI ethical concerns and serves as a platform for exploring responsible AI development strategies, despite not completely eliminating problematic content generation.

Overview of phi-1.5

The intriguing development in the field of language models calls for attention with the creation of a new model dubbed phi-1.5, boasting 1.3 billion parameters. This model builds upon the idea that high-quality synthetic training data can produce language understanding and reasoning capabilities comparable to larger models with a fraction of the computational footprint.

Performance and Benchmarks

Phi-1.5 is calibrated to excel in common sense reasoning and basic coding, engaging in tasks usually reserved for its larger counterparts. Benchmarked against larger models—some with up to 13 billion parameters—it demonstrates a startling competence, particularly in multi-step reasoning problems. Importantly, the model's reliance on synthetic data—absent of web content—seems to also reduce the generation of toxic and biased outputs, an issue plaguing many contemporary models.

Training Methodology

The team behind phi-1.5 designed an elaborate process involving the careful selection of seed topics, iterative fine-tuning, and strategic topic expansion, revealing that data quality may be as crucial as data quantity. Remarkably, the resulting synthetic dataset which forms the core of phi-1.5's training material is nearly ten times smaller than those used for state-of-the-art models of similar caliber, suggesting efficient learning mechanisms at play.

Implications of phi-1.5

Phi-1.5's open-source availability marks a step towards democratizing AI research. While still lagging behind the most extensive language models, it shows off traits once exclusive to those behemoths, inviting broader experimentation and investigation. The model may also pave the way for more energy-efficient and globally accessible AI solutions, challenging the industry norm that larger and computationally intensive models are a necessity for advanced AI capabilities.

Confronting AI Shortcomings

Notably, phi-1.5 does not fully eschew the generation of problematic content. However, it does show promise in managing these risks better than similar-sized models trained solely on web data. The research team presents phi-1.5 as a testbed for methodologies aimed at mitigating ethical AI issues, employing a synthetic training regimen that could herald a new direction for responsible AI development. As the quest for AI models that balance environmental sustainability, ethical soundness, and cognitive prowess continues, phi-1.5 emerges as a promising harbinger of a more balanced approach to AI scalability and sophistication.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube