Emergent Mind

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

(2211.05100)
Published Nov 9, 2022 in cs.CL

Abstract

LLMs have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
References
  1. Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus. In Harald Lüngen, Marc Kupietz, Piotr Bański, Adrien Barbaresi, Simon Clematide, and Ines Pisetta, editors, Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-9), pages 1–9, Limerick, Ireland, 2021. Leibniz-Institut für Deutsche Sprache. doi: 10.14618/ids-pub-10468. https://nbn-resolving.org/urn:nbn:de:bsz:mh39-104688.

  2. Judit Ács. Exploring bert’s vocabulary, 2019. http://juditacs.github.io/2019/02/19/bert-tokenization-stats.html.

  3. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In International Conference on Learning Representations (ICLR), April 2017.
  4. BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model
  5. Character-level language modeling with deeper self-attention. In Proceedings of the AAAI conference on artificial intelligence
  6. Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
  7. Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
  8. PromptSource: An integrated development environment and repository for natural language prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 93–104, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-demo.9. https://aclanthology.org/2022.acl-demo.9.

  9. Evaluating the carbon footprint of NLP methods: a survey and analysis of existing tools. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, pages 11–21, Virtual, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.sustainlp-1.2. https://aclanthology.org/2021.sustainlp-1.2.

  10. Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM
  11. DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation. Language Resources and Evaluation, pages 635–660, 2020. doi: 10.1007/s10579-020-09514-4. https://doi.org/10.1007/s10579-020-09514-4.

  12. Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 48(1):207–219, March 2022. doi: 10.1162/coli˙a˙00422. https://aclanthology.org/2022.cl-1.7.

  13. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72, March 2019. doi: 10.1162/tacl˙a˙00254. https://www.aclweb.org/anthology/Q19-1004.

  14. What do neural machine translation models learn about morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 861–872, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1080. https://www.aclweb.org/anthology/P17-1080.

  15. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623
  16. A neural probabilistic language model. Advances in Neural Information Processing Systems
  17. Datasheet for the Pile
  18. BigScience Workshop. BLOOM (revision 4ab0472), 2022. https://huggingface.co/bigscience/bloom.

  19. Multimodal datasets: misogyny, pornography, and malignant stereotypes
  20. The values encoded in machine learning research. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, page 173–184, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3533083. https://doi.org/10.1145/3531146.3533083.
  21. Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow, march 2021. https://doi. org/10.5281/zenodo, 5297715.

  22. GPT-NeoX-20B: An Open-Source Autoregressive Language Model
  23. Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1004–1015, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.81. https://aclanthology.org/2021.acl-long.81.

  24. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 12–58, Baltimore, Maryland, USA, June 2014. Association for Computational Linguistics. doi: 10.3115/v1/W14-3302. https://aclanthology.org/W14-3302.

  25. J. Scott Brennen. An industry-led debate: how uk media cover artificial intelligence
  26. What to expect when you’re expecting robots: Futures, expectations, and pseudo-artificial general intelligence in uk news. Journalism, 23(1):22–38, 2022. doi: 10.1177/1464884920947535. https://doi.org/10.1177/1464884920947535.

  27. Language models are few-shot learners. Advances in Neural Information Processing Systems
  28. Quality at a glance: An audit of web-crawled multilingual datasets. Transactions of the Association for Computational Linguistics, 10:50–72
  29. Evaluating Large Language Models Trained on Code
  30. The grammar-learning trajectories of neural language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8281–8297, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.568. https://aclanthology.org/2022.acl-long.568.

  31. PaLM: Scaling Language Modeling with Pathways
  32. Scaling Instruction-Finetuned Language Models
  33. Natural language processing (almost) from scratch. Journal of machine learning research, 12
  34. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1198. https://aclanthology.org/P18-1198.

  35. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.747. https://aclanthology.org/2020.acl-main.747.

  36. Behavioral use licensing for responsible ai. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, page 778–788, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3533143. https://doi.org/10.1145/3531146.3533143.
  37. Entities, dates, and languages: Zero-shot on historical texts with t0. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating LLMs, pages 75–83, virtual+Dublin, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.bigscience-1.7. https://aclanthology.org/2022.bigscience-1.7.

  38. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
  39. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics
  40. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In Conference on Empirical Methods in Natural Language Processing
  41. Probing for semantic evidence of composition by means of simple classification tasks. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pages 134–139, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/W16-2524. https://www.aclweb.org/anthology/W16-2524.

  42. Beyond English-Centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48, 2021. http://jmlr.org/papers/v22/20-1307.html.

  43. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23(120):1–39
  44. MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages
  45. InCoder: A Generative Model for Code Infilling and Synthesis
  46. Dataset debt in biomedical language modeling. In Challenges & Perspectives in Creating LLMs, 2022a. https://openreview.net/forum?id=HRfzInfr8Z9.

  47. BigBio: A framework for data-centric biomedical natural language processing. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022b. https://openreview.net/forum?id=8lQDn9zTQlW.

  48. Hungry hungry hippos: Towards language modeling with state space models. In The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=COZDy0WYGg.

  49. Philip Gage. A new algorithm for data compression. C Users J., 12(2):23–38, feb 1994. ISSN 0898-9788.
  50. The Pile: An 800GB Dataset of Diverse Text for Language Modeling
  51. A framework for few-shot language model evaluation, September 2021. https://doi.org/10.5281/zenodo.5371628.

  52. GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
  53. Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
  54. Joshua T. Goodman. A bit of progress in language modeling. Computer Speech & Language, 15(4)
  55. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics, 10:522–538, 2022. doi: 10.1162/tacl˙a˙00474. https://aclanthology.org/2022.tacl-1.30.

  56. Generating Sequences With Recurrent Neural Networks
  57. Hippo: Recurrent memory with optimal polynomial projections. Advances in Neural Information Processing Systems, 33:1474–1487
  58. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations
  59. Deep Learning Scaling is Predictable, Empirically
  60. Designing and interpreting probes with control tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2733–2743, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1275. https://aclanthology.org/D19-1275.

  61. Training Compute-Optimal Large Language Models
  62. Universal language model fine-tuning for text classification. In Annual Meeting of the Association for Computational Linguistics
  63. Visualisation and ’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research, 61:907–926
  64. Data governance in the age of large-scale data-driven language technology. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, page 2206–2222, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3534637. https://doi.org/10.1145/3531146.3534637.
  65. The Ghost in the Machine has an American accent: value conflict in GPT-3
  66. A study of bfloat16 for deep learning training
  67. Scaling Laws for Neural Language Models
  68. What changes can large-scale language models bring? intensive study on HyperCLOVA: Billions-scale korean generative pretrained transformers. In Conference on Empirical Methods in Natural Language Processing
  69. Walter Klöpffer. Life cycle assessment. Environmental Science and Pollution Research, 4(4):223–228
  70. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-2012. https://aclanthology.org/D18-2012.

  71. AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages
  72. Quantifying the Carbon Emissions of Machine Learning
  73. WikiLingua: A new benchmark dataset for cross-lingual abstractive summarization. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4034–4048, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.360. https://aclanthology.org/2020.findings-emnlp.360.

  74. The BigScience ROOTS corpus: A 1.6TB composite multilingual dataset. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. https://openreview.net/forum?id=UoEw6KigkUn.

  75. What language model to train if you have one million GPU hours? In Challenges & Perspectives in Creating LLMs, 2022. https://openreview.net/forum?id=rI7BL3fHIZq.

  76. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Annual Meeting of the Association for Computational Linguistics
  77. Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 175–184, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-demo.21. https://aclanthology.org/2021.emnlp-demo.21.

  78. Competition-Level Code Generation with AlphaCode
  79. Holistic Evaluation of Language Models
  80. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. https://aclanthology.org/W04-1013.

  81. Few-shot Learning with Multilingual Language Models
  82. RoBERTa: A Robustly Optimized BERT Pretraining Approach
  83. S2ORC: The semantic scholar open research corpus. In ACL
  84. SGDR: Stochastic Gradient Descent with Warm Restarts
  85. Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model
  86. SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding
  87. H Mann and D Whitney. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Ann. Math. Stat, 18(1):50–60
  88. CamemBERT: a tasty French language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7203–7219, Online, July 2020. Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.acl-main.645.

  89. Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
  90. Mixed precision training. In International Conference on Learning Representations, 2018. https://openreview.net/forum?id=r1gs9JgRZ.

  91. Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
  92. Natural language processing with modular pdp networks and distributed lexicon. Cognitive Science, 15(3)
  93. Recurrent neural network based language model. In Interspeech
  94. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26
  95. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 220–229, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361255. doi: 10.1145/3287560.3287596. https://doi.org/10.1145/3287560.3287596.
  96. Hugging face tokenizers library. https://github.com/huggingface/tokenizers

  97. Lsdsem 2017 shared task: The story cloze test. In Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pages 46–51
  98. SGPT: GPT Sentence Embeddings for Semantic Search
  99. MTEB: Massive Text Embedding Benchmark
  100. Crosslingual Generalization through Multitask Finetuning
  101. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.154. https://aclanthology.org/2020.emnlp-main.154.

  102. Do transformer modifications transfer across implementations and applications? In Conference on Empirical Methods in Natural Language Processing
  103. Efficient Large-Scale Language Model Training on GPU Clusters using Megatron-LM. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
  104. Participatory research for low-resourced machine translation: A case study in African languages. In ACL Findings
  105. French CrowS-pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8521–8531, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.583. https://aclanthology.org/2022.acl-long.583.

  106. Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 1659–1666, Portorož, Slovenia, May 2016. European Language Resources Association (ELRA). https://aclanthology.org/L16-1262.

  107. Universal Dependencies. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, Valencia, Spain, April 2017. Association for Computational Linguistics. https://aclanthology.org/E17-5001.

  108. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. In Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald Lüngen, and Caroline Iliadi, editors, Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7), pages 9 – 16, Cardiff, UK, 2019. Leibniz-Institut für Deutsche Sprache. doi: 10.14618/ids-pub-9021. http://nbn-resolving.de/urn:nbn:de:bsz:mh39-90215.

  109. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. https://aclanthology.org/P02-1040.
  110. Carbon Emissions and Large Neural Network Training
  111. Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58(347-352):240–242
  112. Deep contextualized word representations. In Conference of the North American Chapter of the Association for Computational Linguistics
  113. EleutherAI: going beyond ”open science” to ”science in the open”. In Workshop on Broadening Research Collaborations
  114. Matt Post. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-6319. https://aclanthology.org/W18-6319.

  115. Train short, test long: Attention with linear biases enables input length extrapolation. In International Conference on Learning Representations
  116. Improving language understanding by generative pre-training
  117. Language models are unsupervised multitask learners
  118. Scaling Language Models: Methods, Analysis & Insights from Training Gopher
  119. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67
  120. Generalized Slow Roll for Tensors
  121. Ai and the everything in the whole wide world benchmark. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021. https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/084b6fbb10729ed4da8c3d3f5a3ae7c9-Paper-round2.pdf.

  122. The fallacy of AI functionality. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, page 959–972, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3533158. https://doi.org/10.1145/3531146.3533158.
  123. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, page 3505–3506, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450379984. doi: 10.1145/3394486.3406703. https://doi.org/10.1145/3394486.3406703.
  124. How good is your tokenizer? on the monolingual performance of multilingual language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3118–3135, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.243. https://aclanthology.org/2021.acl-long.243.

  125. KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2054–2059, Barcelona (online), December 2020. International Committee for Computational Linguistics. https://www.aclweb.org/anthology/2020.semeval-1.271.

  126. On the specification of term values in automatic indexing. Journal of documentation
  127. “everyone wants to do the model work, not the data work”: Data cascades in high-stakes ai. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450380966. doi: 10.1145/3411764.3445518. https://doi.org/10.1145/3411764.3445518.
  128. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations, 2022. https://openreview.net/forum?id=9Vrb9D0WI4.

  129. Sequential neural text compression. IEEE Transactions on Neural Networks, 7(1)
  130. Green ai. Communications of the ACM, 63(12)
  131. Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation
  132. Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3)
  133. GLU Variants Improve Transformer
  134. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017. https://openreview.net/forum?id=B1ckMDqlg.

  135. mGPT: Few-Shot Learners Go Multilingual
  136. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
  137. Un modèle Transformer Génératif Pré-entrainé pour le ______ français. In Pascal Denis, Natalia Grabar, Amel Fraisse, Rémi Cardon, Bernard Jacquemin, Eric Kergosien, and Antonio Balvet, editors, Traitement Automatique des Langues Naturelles, pages 246–255, Lille, France, 2021. ATALA. https://hal.archives-ouvertes.fr/hal-03265900.

  138. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
  139. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
  140. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
  141. Energy and policy considerations for deep learning in nlp. In Annual Meeting of the Association for Computational Linguistics
  142. RoFormer: Enhanced Transformer with Rotary Position Embedding
  143. Generating text with recurrent neural networks. In International Conference on Machine Learning
  144. You reap what you sow: On the challenges of bias evaluation under multilingual settings. In Challenges & Perspectives in Creating LLMs, 2022. https://openreview.net/forum?id=rK-7NhfSIW5.

  145. Transcending Scaling Laws with 0.1% Extra Compute
  146. Emergent structures and training dynamics in LLMs. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating LLMs, pages 146–159, virtual+Dublin, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.bigscience-1.11. https://aclanthology.org/2022.bigscience-1.11.

  147. What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations
  148. Attention is all you need. Advances in neural information processing systems, 30
  149. A Neural Conversational Model
  150. Is neural language acquisition similar to natural? A chronological probing study
  151. Superglue: A stickier benchmark for general-purpose language understanding systems. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. https://proceedings.neurips.cc/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf.

  152. GPT-J-6B: A 6 billion parameter autoregressive language model
  153. Neural machine translation with byte-level subwords. In Proceedings of the AAAI Conference on Artificial Intelligence
  154. Bfloat16: The secret to high performance on cloud tpus, 2019. https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.

  155. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
  156. What language model architecture and pretraining objective works best for zero-shot generalization? In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 22964–22984. PMLR, 17–23 Jul 2022a. https://proceedings.mlr.press/v162/wang22u.html.

  157. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
  158. Finetuned Language Models Are Zero-Shot Learners
  159. Emergent abilities of large language models. Transactions on Machine Learning Research
  160. Faces of Environmental Racism: Confronting Issues of Global Justice. Rowman & Littlefield Publishers
  161. Langdon Winner. Technology as master. (book reviews: Autonomous technology. technics-out-of-control as a theme in political thought). Science
  162. Langdon Winner. Do artifacts have politics? In Computer Ethics, pages 177–192. Routledge
  163. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Internal Medicine, 181(8):1065–1070, 08 2021. ISSN 2168-6106. doi: 10.1001/jamainternmed.2021.2626. https://doi.org/10.1001/jamainternmed.2021.2626.
  164. Optimizing data warehousing applications for GPUs using kernel fusion/fission. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, pages 2433–2442, 2012. doi: 10.1109/IPDPSW.2012.300.
  165. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.41. https://aclanthology.org/2021.naacl-main.41.

  166. XLnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems
  167. GLM-130B: An Open Bilingual Pre-trained Model
  168. PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
  169. OPT: Open Pre-trained Transformer Language Models
  170. When do you need billions of words of pretraining data? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1112–1125, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.90. https://aclanthology.org/2021.acl-long.90.

  171. ERNIE: Enhanced language representation with informative entities. In Annual Meeting of the Association for Computational Linguistics

Show All 171