AutoML using Metadata Language Embeddings (1910.03698v1)

Published 8 Oct 2019 in cs.LG, cs.CL, and stat.ML

Abstract: As a human choosing a supervised learning algorithm, it is natural to begin by reading a text description of the dataset and documentation for the algorithms you might use. We demonstrate that the same idea improves the performance of automated machine learning methods. We use language embeddings from modern NLP to improve state-of-the-art AutoML systems by augmenting their recommendations with vector embeddings of datasets and of algorithms. We use these embeddings in a neural architecture to learn the distance between best-performing pipelines. The resulting (meta-)AutoML framework improves on the performance of existing AutoML frameworks. Our zero-shot AutoML system using dataset metadata embeddings provides good solutions instantaneously, running in under one second of computation. Performance is competitive with AutoML systems OBOE, AutoSklearn, AlphaD3M, and TPOT when each framework is allocated a minute of computation. We make our data, models, and code publicly available.

Citations (23)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning (2020)
A Scalable AutoML Approach Based on Graph Neural Networks (2021)
Deep Pipeline Embeddings for AutoML (2023)
Automatic Componentwise Boosting: An Interpretable AutoML System (2021)
Privileged Zero-Shot AutoML (2021)