Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How (2306.03828v4)
Abstract: With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models with multiple hyperparameter configurations on a series of datasets. To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We meta-learn a multi-fidelity performance predictor on the learning curves of this meta-dataset and use it for fast hyperparameter optimization on new datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters.
- Exploring the limits of large scale pre-training. In Proc. of ICLR’22, 2022.
- Xcit: Cross-covariance image transformers. Advances in neural information processing systems, 34:20014–20027, 2021.
- Hpo-b: A large-scale reproducible benchmark for black-box hpo based on openml, 2021.
- DEHB: Evolutionary hyberband for scalable, robust and efficient Hyperparameter Optimization. In Proc. of IJCAI’21, pp. 2147–2153, 2021a.
- DEHB: evolutionary hyberband for scalable, robust and efficient hyperparameter optimization. CoRR, abs/2105.09821, 2021b.
- Beit: BERT pre-training of image transformers. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=p-BhZSz59o4.
- An information-theoretic approach to transferability in task transfer learning. In 2019 IEEE International Conference on Image Processing, ICIP 2019, Taipei, Taiwan, September 22-25, 2019, pp. 2309–2313. IEEE, 2019.
- J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. 13:281–305, 2012a.
- Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13:281–305, 2012b.
- Scalable diverse model selection for accessible transfer learning. In Proc. of NeurIPS’21, pp. 19301–19312, 2021.
- A closer look at few-shot classification. arXiv preprint arXiv:1904.04232, 2019a.
- Catastrophic forgetting meets negative transfer: Batch spectral shrinkage for safe transfer learning. In Proc. of NeurIPS’19, pp. 1906–1916, 2019b.
- Large scale fine-grained categorization and domain-specific transfer learning. In Proc. of CVPR’18, pp. 4109–4118, 2018.
- BOHB: Robust and efficient Hyperparameter Optimization at scale. In Proc. of ICML’18, pp. 1437–1446, 2018a.
- BOHB: robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 1436–1445, 2018b.
- Initializing Bayesian Hyperparameter Optimization via meta-learning. In Proc. of AAAI’15, pp. 1128–1135, 2015.
- inaturalist competition datasets. https://github.com/visipedia/inat_comp, 2021.
- Jermey Howard. Imagenette. https://github.com/fastai/imagenette, 2019.
- Lora: Low-rank adaptation of large language models. CoRR, abs/2106.09685, 2021. URL https://arxiv.org/abs/2106.09685.
- Sequential model-based optimization for general algorithm configuration. In Proc. of LION’11, pp. 507–523, 2011.
- Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019. Available for free at http://automl.org/book.
- Dataset2vec: Learning dataset meta-features. Data Mining and Knowledge Discovery, 35:964–985, 2021.
- Efficient global optimization of expensive black-box functions. J. of Global Optimization, 13(4):455–492, dec 1998. ISSN 0925-5001.
- Bo Fu Junguang Jiang, Baixu Chen and Mingsheng Long. Transfer-learning-library. https://github.com/thuml/Transfer-Learning-Library, 2020.
- Scaling laws for hyperparameter optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=ghzEUGfRMD.
- Deep ranking ensembles for hyperparameter optimization. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=_ruvo2KCL2x.
- Segment anything. arXiv:2304.02643, 2023.
- Big transfer (bit): General visual representation learning. In Proc. of ECCV’20, pp. 491–507, 2020.
- Stochastic normalization. In Proc. of NeurIPS’20, 2020.
- Surgical fine-tuning improves adaptation to distribution shifts. CoRR, abs/2210.11466, 2022. URL https://doi.org/10.48550/arXiv.2210.11466.
- Rethinking the hyperparameters for fine-tuning. In Proc. of ICLR’20, 2020.
- Hyperband: Bandit-based configuration evaluation for Hyperparameter Optimization. In Proc. of ICLR’17, 2017.
- Explicit inductive bias for transfer learning with convolutional networks. In Proc. of ICML’18, pp. 2830–2839, 2018.
- Delta: Deep learning transfer using feature map with attention for convolutional networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=rkgbwsAcYm.
- Transtailor: Pruning the pre-trained model for improved transfer learning. In Proc. of AAAI’21, pp. 8627–8634, 2021.
- LEEP: A new measure to evaluate transferability of learned representations. In Proc. of ICML’20, volume 119, pp. 7294–7305, 2020.
- Dinov2: Learning robust visual features without supervision, 2023.
- Zero-shot automl with pretrained models. In Proc. of ICML’22, pp. 1128–1135, 2022.
- Deep pipeline embeddings for automl. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pp. 1907–1919, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599303. URL https://doi.org/10.1145/3580305.3599303.
- Learning transferable visual models from natural language supervision, 2021.
- R. Ramesh and P. Chaudhari. Model zoo: A growing brain that learns continually. In Proc. of ICLR’22, 2022.
- C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.
- A quantile-based approach for hyperparameter transfer learning. In Proc. of ICML’20, pp. 8438–8448, 2020.
- Syne tune: A library for large scale hyperparameter tuning and reproducible research. In International Conference on Automated Machine Learning, AutoML 2022, 2022. URL https://proceedings.mlr.press/v188/salinas22a.html.
- Hyper-representations for pre-training and transfer learning. CoRR, abs/2207.10951, 2022.
- Model zoos: A dataset of diverse populations of neural network models. In Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, 2022.
- Gray-box gaussian processes for automated reinforcement learning. In ICLR 2023, 2023a. URL https://openreview.net/forum?id=rmoMvptXK7M.
- Transfer NAS with meta-learned bayesian surrogates. In ICLR 2023, 2023b. URL https://openreview.net/forum?id=paGvsrl4Ntr.
- Zoo-tuning: Adaptive transfer from A zoo of models. In Proc. of ICML’21, volume 139, pp. 9626–9637, 2021.
- Hub-pathway: Transfer learning from A hub of pre-trained models. 2022.
- Scalable Bayesian optimization using deep neural networks. In Proc. of ICML’15, pp. 2171–2180, 2015.
- Bayesian optimization with robust Bayesian neural networks. In D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett (eds.), Proc. of NeurIPS’16, 2016.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp. 6105–6114. PMLR, 2019. URL http://proceedings.mlr.press/v97/tan19a.html.
- Auto-WEKA: combined selection and Hyperparameter Optimization of classification algorithms. In Proc. of KDD’13, pp. 847–855, 2013.
- Transferability and hardness of supervised classification tasks. In Proc. of ICCV’19, pp. 1395–1405. IEEE, 2019a.
- Transferability and hardness of supervised classification tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1395–1405, 2019b.
- Meta-album: Multi-domain meta-dataset for few-shot image classification. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URL https://meta-album.github.io/.
- Revisit finetuning strategy for few-shot learning to transfer the emdeddings. In The Eleventh International Conference on Learning Representations, 2023.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Maximizing acquisition functions for Bayesian optimization. In Proc. of NeurIPS’18, pp. 741–749, 2018.
- Practical and sample efficient zero-shot HPO. arXiv:2007.13382 [stat.ML], 2020.
- M. Wistuba and J. Grabocka. Few-shot bayesian optimization with deep kernel surrogates. In Proc. of ICLR’21, 2021a.
- Sequential Model-free Hyperparameter Tuning. In Proc. of ICDM ’15, pp. 1033–1038, 2015.
- Two-stage transfer surrogate model for automatic Hyperparameter Optimization. In Proc. of ECML/PKDD’16, pp. 199–214, 2016.
- Supervising the multi-fidelity race of hyperparameter configurations. In Proc. of NeurIPS’22, 2022.
- Few-shot bayesian optimization with deep kernel surrogates. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021b.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In Proc. of ICML’22, volume 162, pp. 23965–23998, 2022a.
- Robust fine-tuning of zero-shot models. In Proc. of CVPR’22, pp. 7949–7961, 2022b.
- How transferable are features in deep neural networks? In Proc. of NeurIPS’14, pp. 3320–3328, 2014.
- Co-tuning for transfer learning. In Proc. of NeurIPS’20, pp. 17236–17246, 2020.
- Ranking and tuning pre-trained models: A new paradigm of exploiting model hubs. CoRR, abs/2110.10545, 2021a. URL https://arxiv.org/abs/2110.10545.
- Logme: Practical assessment of pre-trained models for transfer learning. In Proc. of ICML’21, pp. 12133–12143, 2021b.
- Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2403–2412, 2018.
- Bi-tuning of pre-trained representations. arXiv preprint arXiv:2011.06182, 2020.