Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation (2306.16916v1)
Abstract: We introduce ordered transfer hyperparameter optimisation (OTHPO), a version of transfer learning for hyperparameter optimisation (HPO) where the tasks follow a sequential order. Unlike for state-of-the-art transfer HPO, the assumption is that each task is most correlated to those immediately before it. This matches many deployed settings, where hyperparameters are retuned as more data is collected; for instance tuning a sequence of movie recommendation systems as more movies and ratings are added. We propose a formal definition, outline the differences to related problems and propose a basic OTHPO method that outperforms state-of-the-art transfer HPO. We empirically show the importance of taking order into account using ten benchmarks. The benchmarks are in the setting of gradually accumulating data, and span XGBoost, random forest, approximate k-nearest neighbor, elastic net, support vector machines and a separate real-world motivated optimisation problem. We open source the benchmarks to foster future research on ordered transfer HPO.
- Transfer learning for Bayesian optimization: A survey. arXiv preprint arXiv:2302.05927.
- BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33.
- Collecting empirical data about hyperparameters for data driven AutoML. Democratizing Machine Learning Contributions in AutoML and Fairness, page 93.
- On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486.
- XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.
- HEBO: pushing the limits of sample-efficient hyperparameter optimisation. Journal of Artificial Intelligence Research, 74:1269–1349.
- UCI machine learning repository.
- Simopt: A testbed for simulation-optimization experiments. INFORMS Journal on Computing.
- Hyperparameter optimization. Automated machine learning: Methods, systems, challenges, pages 3–33.
- Practical transfer learning for Bayesian optimization. arXiv preprint arXiv:1802.02219.
- Initializing Bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
- Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22.
- Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1487–1495.
- Hyperparameter transfer learning with adaptive complexity. In International Conference on Artificial Intelligence and Statistics, pages 1378–1386. PMLR.
- Non-stochastic best arm identification and hyperparameter optimization. In Artificial intelligence and statistics, pages 240–248. PMLR.
- Transfer learning for Bayesian HPO with end-to-end landmark meta-features. In Fifth Workshop on Meta-Learning at the Conference on Neural Information Processing Systems.
- Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455.
- Fast Bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics, pages 528–536. PMLR.
- Model-based asynchronous hyperparameter and neural architecture search. arXiv preprint arXiv:2003.10865.
- Multidimensional curve classification using passing-through regions. Pattern Recognition Letters, 20(11-13):1103–1111.
- Kushmerick, N. (1999). Learning to remove internet advertisements. In Proceedings of the third annual conference on Autonomous Agents, pages 175–181.
- Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1):6765–6816.
- Gesture unit segmentation using support vector machines: segmenting gestures from rest positions. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, pages 46–52.
- Automatic termination for hyperparameter optimization. In Guyon, I., Lindauer, M., van der Schaar, M., Hutter, F., and Garnett, R., editors, Proceedings of the First International Conference on Automated Machine Learning, volume 188 of Proceedings of Machine Learning Research, pages 7/1–21. PMLR.
- Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836.
- Organizers of KDD Cup 2012 and Tencent Inc (2012). 2012 KDD Cup data set.
- Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning. Advances in Neural Information Processing Systems, 32.
- YAHPO Gym – an efficient multi-objective multi-fidelity benchmark for hyperparameter optimization. In International Conference on Automated Machine Learning, pages 3–1. PMLR.
- Quinlan, J. R. (1987). Simplifying decision trees. International journal of man-machine studies, 27(3):221–234.
- Syne Tune: A library for large scale hyperparameter tuning and reproducible research. In First Conference on Automated Machine Learning (Main Track).
- A quantile-based approach for hyperparameter transfer learning. In International conference on machine learning, pages 8438–8448. PMLR.
- Simonoff, J. S. (2003). Analyzing categorical data, volume 496. Springer.
- Practical Bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25.
- Bayesian optimization with robust Bayesian neural networks. In Proceedings of the 29th International Conference on Advances in Neural Information Processing Systems (NIPS’16).
- Hyperparameter transfer across developer adjustments. arXiv preprint arXiv:2010.13117.
- BORE: Bayesian optimization by density-ratio estimation. In International Conference on Machine Learning, pages 10289–10300. PMLR.
- Three scenarios for continual learning. arXiv preprint arXiv:1904.07734.
- OpenML: networked science in machine learning. SIGKDD Explorations, 15(2):49–60.
- Few-shot Bayesian optimization with deep kernel surrogates. arXiv preprint arXiv:2101.07667.
- Learning data set similarities for hyperparameter optimization initializations. In Metasel@ pkdd/ecml, pages 15–26.
- Sequential model-free hyperparameter tuning. In 2015 IEEE international conference on data mining, pages 1033–1038. IEEE.
- ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77:1–17.
- A resource-efficient method for repeated hpo and nas problems. arXiv:2103.16111 [cs.LG].
- Lifelong Bayesian optimization. arXiv preprint arXiv:1905.12280.