Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TACO: Topics in Algorithmic COde generation dataset (2312.14852v3)

Published 22 Dec 2023 in cs.AI

Abstract: We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem understanding and reasoning abilities in real-world programming scenarios. There are 25433 and 1000 coding problems in training and test set, as well as up to 1.55 million diverse solution answers. Moreover, each TACO problem includes several fine-grained labels such as task topics, algorithms, programming skills, and difficulty levels, providing a more precise reference for the training and evaluation of code generation models. The dataset and evaluation scripts are available on Hugging Face Hub (https://huggingface.co/datasets/BAAI/TACO) and Github (https://github.com/FlagOpen/TACO).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Rich Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
  4. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  5. Measuring coding challenge competence with apps. arXiv preprint arXiv:2105.09938, 2021.
  6. Antti Laaksonen. Competitive programmer’s handbook. Preprint, 5, 2017.
  7. Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.
  8. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161, 2023.
  9. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210, 2023.
  10. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474, 2022.
  11. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  12. Code llama: Open foundation models for code. 2023.
  13. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Citations (21)

Summary

  • The paper presents TACO, a novel dataset featuring 26,443 algorithmic problems with detailed metadata on topics, skills, and difficulty to enhance LLM training.
  • It employs rigorous parsing and deduplication methods, integrating competition-level challenges with an average of 202.3 test cases per problem for robust evaluation.
  • Evaluation with state-of-the-art models like GPT-4 reveals low pass rates on complex tasks, underscoring the dataset's potential to advance algorithmic code generation research.

A Study on the TACO Dataset for Algorithmic Code Generation

The paper introduces TACO, an extensive and open-source dataset designed to elevate the training and evaluation processes within the algorithmic code generation domain. As the capabilities of LLMs to generate code from textual descriptions have advanced, this dataset arrives at a critical juncture to present challenges that go beyond basic programming problems. The dataset's complexity is underscored by the inclusion of competition-level tasks, demanding higher understanding and reasoning skills from contemporary models.

Main Contributions and Features

The TACO dataset is characterized by several key features:

  1. Scale and Composition: Comprising 26,443 problems, TACO represents an extensive repository where the scope ranges from fundamental concepts such as mathematics to advanced topics like graph theory and data structures. This scale surpasses previous datasets like APPS and CodeContest in terms of problem count and Python-based solution variety.
  2. Fine-grained Annotations: Each problem in TACO is supplemented with comprehensive metadata including task topics, algorithm types, programming skills, and difficulty levels. This feature addresses a significant shortcoming in existing datasets by providing context that is vital for nuanced model training and evaluation.
  3. Data Quality and Source Robustness: The dataset integrates problems from renowned competition platforms such as CodeChef, CodeForces, and HackerRank, augmented by manual verification processes and sophisticated parsing techniques. A rigorous deduplication mechanism ensures that solutions are unique and free from redundant annotations.
  4. Algorithmic and Skill-based Labeling: Problems are equipped with incredibly detailed algorithmic labels, categorized into 36 distinct topics. These labels facilitate focused training, aiding models to identify and apply the correct methods for varied algorithmic challenges.
  5. Test Set Rigor and Diversity: The test set of TACO comprises 1,000 rigorously validated problems with an average of 202.3 test cases per problem, effectively reducing previous datasets' issues related to test set validity and false positives.

Evaluation Methodology

The evaluation framework in the paper involves state-of-the-art models like codellama and starcoder, with performance metrics such as pass@k scores applied across diverse difficulty levels. The dataset's rigor is underscored by results indicating that even advanced models like GPT-4 achieve relatively low pass rates on more complex tasks within TACO, highlighting the dataset's capacity to stress-test code generation models robustly.

Implications and Future Directions

Practical Implications: For educators, TACO's detailed labeling offers a syllabus for curriculum design centered around algorithmic understanding. For model developers, the dataset's granularity supports the development of models with improved context comprehension and task-specific algorithm recommendations.

Theoretical Implications: From a broader research perspective, TACO introduces a platform to explore the capabilities of LLMs in understanding and generating algorithms, challenging the models to expand beyond learned patterns.

Future Developments: As models evolve, integrating more sophisticated neural architectures that leverage TACO's detailed annotations could yield models capable of approaching or surpassing human-level problem-solving in algorithmic contexts. Continued refinement and expansion of the dataset's labels and problem complexities promise to keep it at the forefront of code generation research tools.

In conclusion, TACO stands as a significant step toward sophisticated code generation datasets by providing an environment rich in both quantity and quality of data for comprehensive model evaluation and training. Its deployment promises to enhance both model capabilities and the depth of algorithmic understanding achievable within AI systems.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub