Emergent Mind

LLaMA: Open and Efficient Foundation Language Models

(2302.13971)
Published Feb 27, 2023 in cs.CL

Abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

Overview

  • Meta AI introduced LLaMA, a set of foundation language models prioritizing openness and efficiency, and trained on widely available datasets.

  • The LLaMA-13B model outperforms GPT-3, despite being significantly smaller, and the LLaMA-65B model is competitive with larger languages models.

  • LLaMA models utilize transformer architectures, are trained on large public datasets, and are optimized for both scalability and inferential efficiency.

  • Techniques like pre-normalization, SwiGLU activation, and Rotary Embeddings enhance model performance, achieving fast processing speeds during training.

  • Meta AI's release of LLaMA models emphasizes responsible AI with reported energy use and carbon emissions, advocating for sustainable AI development.

Introduction

Meta AI recently unveiled LLaMA, a collection of foundation language models designed for openness and efficiency. These models were trained exclusively on publicly available datasets, marking a significant departure from previous models that relied on proprietary and often inaccessible data. Remarkably, the LLaMA-13B variant outperforms GPT-3, equipped with 175 billion parameters, on numerous benchmarks, yet is an order of magnitude smaller in size. At the larger end, LLaMA-65B competes closely with giant language models such as Chinchilla-70B and PaLM-540B.

Approach

The LLaMA models are constructed upon transformer architectures and trained on a vast textual corpus composed of combined public datasets that span various domains. The training approach aligns with the Chinchilla scaling laws, optimizing the dataset and model size scaling given a particular computational budget for training. Notably, the inference stage, critical for operating a language model at scale, was also considered in optimizing the models for better inferential efficiency.

Optimization and Performance

LLaMA models range from 7 billion to 65 billion parameters and were trained on anywhere from 1 trillion to 1.4 trillion tokens. They incorporate several improvements over the initial transformer architecture, including pre-normalization, the SwiGLU activation function, and Rotary Embeddings. These models underwent a meticulous optimization process, employing advanced techniques like causal multi-head attention for reduced memory usage, gradient checkpointing, and overlapping computations with GPU communications. Their developers report remarkable processing speeds, with the training of the 1.4 trillion token dataset taking roughly three weeks.

Comparative Evaluation and Impact

The evaluation of the LLaMA models spanned free-form generation and multiple-choice tasks across various standard benchmarks. When pitted against established foundation models, LLaMA demonstrated compelling performance, even challenging larger counterparts, thus paving the way for efficient yet powerful language models accessible to the research community. Additionally, the release of these models emphasizes responsible AI practices, reflecting efforts to democratize AI technologies.

The conversation about AI's carbon footprint is also gaining momentum. Meta AI provided a candid breakdown of the energy usage and estimated carbon emissions associated with training the LLaMA models. By releasing these models openly, the intent is to reduce redundant energy expenditure in the AI community, as further training of similar scale may not be necessary. This move could set a precedent for more sustainable AI development practices in the future.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube