Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 104 tok/s
Gemini 3.0 Pro 36 tok/s Pro
Gemini 2.5 Flash 133 tok/s Pro
Kimi K2 216 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Metalearning with Very Few Samples Per Task (2312.13978v2)

Published 21 Dec 2023 in cs.LG and cs.DS

Abstract: Metalearning and multitask learning are two frameworks for solving a group of related learning tasks more efficiently than we could hope to solve each of the individual tasks on their own. In multitask learning, we are given a fixed set of related learning tasks and need to output one accurate model per task, whereas in metalearning we are given tasks that are drawn i.i.d. from a metadistribution and need to output some common information that can be easily specialized to new tasks from the metadistribution. We consider a binary classification setting where tasks are related by a shared representation, that is, every task $P$ can be solved by a classifier of the form $f_{P} \circ h$ where $h \in H$ is a map from features to a representation space that is shared across tasks, and $f_{P} \in F$ is a task-specific classifier from the representation space to labels. The main question we ask is how much data do we need to metalearn a good representation? Here, the amount of data is measured in terms of the number of tasks $t$ that we need to see and the number of samples $n$ per task. We focus on the regime where $n$ is extremely small. Our main result shows that, in a distribution-free setting where the feature vectors are in $\mathbb{R}d$, the representation is a linear map from $\mathbb{R}d \to \mathbb{R}k$, and the task-specific classifiers are halfspaces in $\mathbb{R}k$, we can metalearn a representation with error $\varepsilon$ using $n = k+2$ samples per task, and $d \cdot (1/\varepsilon){O(k)}$ tasks. Learning with so few samples per task is remarkable because metalearning would be impossible with $k+1$ samples per task, and because we cannot even hope to learn an accurate task-specific classifier with $k+2$ samples per task. Our work also yields a characterization of distribution-free multitask learning and reductions between meta and multitask learning.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: