Multi-Objective Optimization for Sparse Deep Multi-Task Learning (2308.12243v4)
Abstract: Different conflicting optimization criteria arise naturally in various Deep Learning scenarios. These can address different main tasks (i.e., in the setting of Multi-Task Learning), but also main and secondary tasks such as loss minimization versus sparsity. The usual approach is a simple weighting of the criteria, which formally only works in the convex setting. In this paper, we present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs) with respect to several tasks. By employing this scalarization technique, the algorithm can identify all optimal solutions of the original problem while reducing its complexity to a sequence of single-objective problems. The simplified problems are then solved using an Augmented Lagrangian method, enabling the use of popular optimization techniques such as Adam and Stochastic Gradient Descent, while efficaciously handling constraints. Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with a particular focus on Deep Multi-Task models, which are typically designed with a very large number of weights to perform equally well on multiple tasks. Through experiments conducted on two Machine Learning datasets, we demonstrate the possibility of adaptively sparsifying the model during training without significantly impacting its performance, if we are willing to apply task-specific adaptations to the network weights. Code is available at https://github.com/salomonhotegni/MDMTN
- F. Akbari, M. Ghaznavi, and E. Khorram, “A revised pascoletti–serafini scalarization method for multiobjective optimization problems,” Journal of Optimization Theory and Applications, vol. 178, pp. 560–590, 2018.
- S. Banholzer, L. Mechelli, and S. Volkwein, “A trust region reduced basis pascoletti-serafini algorithm for multi-objective pde-constrained parameter optimization,” Mathematical and Computational Applications, vol. 27, no. 3, p. 39, 2022.
- J. Baxter, “A bayesian/information theoretic model of learning to learn via multiple task sampling,” Machine learning, vol. 28, pp. 7–39, 1997.
- K. Bieker, B. Gebken, and S. Peitz, “On the treatment of optimization problems with l1 penalty terms via multiobjective continuation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7797–7808, 2021.
- M. A. Braun, “Scalarized preferences in multi-objective optimization.”
- K. Deb and H. Gupta, “Searching for robust pareto-optimal solutions in multi-objective optimization,” in Evolutionary Multi-Criterion Optimization: Third International Conference, EMO 2005, Guanajuato, Mexico, March 9-11, 2005. Proceedings 3. Springer, 2005, pp. 150–164.
- M. Dellnitz, O. Schütze, and T. Hestermeyer, “Covering pareto sets by multilevel subdivision techniques,” Journal of optimization theory and applications, vol. 124, pp. 113–136, 2005.
- J. Dodge, T. Prewitt, R. Tachet des Combes, E. Odmark, R. Schwartz, E. Strubell, A. S. Luccioni, N. A. Smith, N. DeCario, and W. Buchanan, “Measuring the carbon intensity of ai in cloud instances,” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1877–1894.
- L. Duong, T. Cohn, S. Bird, and P. Cook, “Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser,” in Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: short papers), 2015, pp. 845–850.
- G. Eichfelder, “Adaptive scalarization methods in multiobjective optimization springer,” Berlin, 2008.
- J. Fliege and B. F. Svaiter, “Steepest descent methods for multicriteria optimization,” Mathematical methods of operations research, vol. 51, pp. 479–494, 2000.
- A. M. Geoffrion, “Proper efficiency and the theory of vector maximization,” Journal of Mathematical Analysis and Applications, vol. 22, no. 3, pp. 618–630, 1968. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0022247X68902011
- M. R. Hestenes, “Multiplier and gradient methods,” Journal of optimization theory and applications, vol. 4, no. 5, pp. 303–320, 1969.
- T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste, “Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks,” The Journal of Machine Learning Research, vol. 22, no. 1, pp. 10 882–11 005, 2021.
- I. Kaliszewski, “A modified weighted tchebycheff metric for multiple objective programming,” Computers & operations research, vol. 14, no. 4, pp. 315–323, 1987.
- N. Khan and I. Stavness, “Sparseout: Controlling sparsity in deep networks,” in Advances in Artificial Intelligence: 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019, Kingston, ON, Canada, May 28–31, 2019, Proceedings 32. Springer, 2019, pp. 296–307.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- A. Konak, D. W. Coit, and A. E. Smith, “Multi-objective optimization using genetic algorithms: A tutorial,” Reliability engineering & system safety, vol. 91, no. 9, pp. 992–1007, 2006.
- A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009.
- Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2.
- W.-H. Li and H. Bilen, “Knowledge distillation for multi-task learning,” in Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 2020, pp. 163–176.
- S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1871–1880.
- X. Liu, P. He, W. Chen, and J. Gao, “Improving multi-task deep neural networks via knowledge distillation for natural language understanding,” arXiv preprint arXiv:1904.09482, 2019.
- X. Ma, M. Qin, F. Sun, Z. Hou, K. Yuan, Y. Xu, Y. Wang, Y.-K. Chen, R. Jin, and Y. Xie, “Effective model sparsification by scheduled grow-and-prune methods,” arXiv preprint arXiv:2106.09857, 2021.
- D. Mahapatra and V. Rajan, “Exact pareto optimal search for multi-task learning: touring the pareto front,” arXiv preprint arXiv:2108.00597.
- R. T. Marler and J. S. Arora, “Survey of multi-objective optimization methods for engineering,” Structural and multidisciplinary optimization, vol. 26, pp. 369–395, 2004.
- D. Mishkin and J. Matas, “All you need is a good init,” arXiv preprint arXiv:1511.06422, 2015.
- D. Molchanov, A. Ashukha, and D. Vetrov, “Variational dropout sparsifies deep neural networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 2498–2507.
- U. Oswal, C. Cox, M. Lambon-Ralph, T. Rogers, and R. Nowak, “Representational similarity learning with application to brain networks,” in International Conference on Machine Learning. PMLR, 2016, pp. 1041–1049.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- S. Peitz and M. Dellnitz, “Gradient-based multiobjective optimization with uncertainties,” in NEO 2016: Results of the Numerical and Evolutionary Optimization Workshop NEO 2016 and the NEO Cities 2016 Workshop held on September 20-24, 2016 in Tlalnepantla, Mexico. Springer, 2018, pp. 159–182.
- ——, “A survey of recent trends in multiobjective optimal control—surrogate models, feedback control and objective reduction,” Mathematical and Computational Applications, vol. 23, no. 2, 2018. [Online]. Available: https://www.mdpi.com/2297-8747/23/2/30
- V. Perera, T. Chung, T. Kollar, and E. Strubell, “Multi-task learning for parsing the alexa meaning representation language,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
- R. T. Rockafellar, “The multiplier method of hestenes and powell applied to convex programming,” Journal of Optimization Theory and applications, vol. 12, no. 6, pp. 555–562, 1973.
- M. Ruchte and J. Grabocka, “Scalable pareto front approximation for deep multi-objective learning,” in 2021 IEEE international conference on data mining (ICDM). IEEE, 2021, pp. 1306–1311.
- S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” Advances in neural information processing systems, vol. 30.
- O. Schütze, K. Witting, S. Ober-Blöbaum, and M. Dellnitz, “Set oriented methods for the numerical treatment of multiobjective optimization problems,” in EVOLVE-A Bridge between Probability, Set Oriented Numerics and Evolutionary Computation. Springer, 2013, pp. 187–219.
- O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization,” Advances in neural information processing systems, vol. 31.
- B. Ustun and C. Rudin, “Supersparse linear integer models for optimized medical scoring systems,” Machine Learning, vol. 102, pp. 349–391.
- K. Van Moffaert, M. M. Drugan, and A. Nowé, “Scalarized multi-objective reinforcement learning: Novel design techniques,” in 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2013, pp. 191–199.
- S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, “Multi-task learning for dense prediction tasks: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3614–3633, 2021.
- A. Wong, Y. Wu, S. Abbasi, S. Nair, Y. Chen, and M. J. Shafiee, “Fast graspnext: A fast self-attention neural network architecture for multi-task learning in computer vision tasks for robotic grasping on the edge,” arXiv preprint arXiv:2304.11196, 2023.
- Z. Yue, F. Ye, Y. Zhang, C. Liang, and I. W. Tsang, “Deep safe multi-task learning,” arXiv preprint arXiv:2111.10601, 2021.
- D. Zhang, H. Wang, M. Figueiredo, and L. Balzano, “Learning to share: Simultaneous parameter tying and sparsification in deep learning,” in International Conference on Learning Representations, 2018.
- Q. Zhang and H. Li, “Moea/d: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on evolutionary computation, vol. 11, no. 6, pp. 712–731, 2007.
- Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5586–5609, 2021.
- H. Zhou, J. M. Alvarez, and F. Porikli, “Less is more: Towards compact cnns,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 662–677.
- X. Zhou, W. Zhang, H. Xu, and T. Zhang, “Effective sparsification of neural networks with global sparsity constraint,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3599–3608.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.