AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning (1911.12423v2)

Published 27 Nov 2019 in cs.CV and cs.LG

Abstract: Multi-task learning is an open and challenging problem in computer vision. The typical way of conducting multi-task learning with deep neural networks is either through handcrafted schemes that share all initial layers and branch out at an adhoc point, or through separate task-specific networks with an additional feature sharing/fusion mechanism. Unlike existing methods, we propose an adaptive sharing approach, called AdaShare, that decides what to share across which tasks to achieve the best recognition accuracy, while taking resource efficiency into account. Specifically, our main idea is to learn the sharing pattern through a task-specific policy that selectively chooses which layers to execute for a given task in the multi-task network. We efficiently optimize the task-specific policy jointly with the network weights, using standard back-propagation. Experiments on several challenging and diverse benchmark datasets with a variable number of tasks well demonstrate the efficacy of our approach over state-of-the-art methods. Project page: https://cs-people.bu.edu/sunxm/AdaShare/project.html.

Citations (240)

View on Semantic Scholar

Summary

The paper introduces AdaShare, a novel adaptive mechanism that dynamically selects which layers to share across tasks to enhance multi-task learning.
It leverages Gumbel-Softmax sampling and regularization to learn efficient, task-specific sharing policies without extra computational overhead.
Extensive experiments show AdaShare outperforms traditional models, achieving higher accuracy with fewer parameters on benchmarks like NYU v2 and CityScapes.

This document provides an expert analysis of the research paper "AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning" by Ximeng Sun et al. The paper addresses a significant challenge in the field of multi-task learning, particularly within the framework of deep neural networks, by proposing a novel method titled AdaShare. Multi-task learning (MTL) is pivotal in computer vision as it allows for simultaneous optimization of multiple related tasks, leading to better generalization and reduced training costs. However, the crux of efficient MTL lies in determining which parameters or layers should be shared among tasks to maximize performance while minimizing resource usage. AdaShare aims to address this problem through an adaptive, learnable approach that is more efficient compared to existing methods.

Summary of Contributions

Adaptive Feature Sharing: AdaShare introduces an adaptive mechanism to decide on a task-specific basis what layers or features should be shared across tasks and which should be kept task-specific. This decision is made dynamically and is optimized jointly with network parameters.
Efficient Optimization: The task-specific sharing policy is learned concurrently with the network weights using Gumbel-Softmax sampling. This approach facilitates back-propagation without necessitating complex reinforcement learning approaches or additional networks, promoting computational efficiency.
Regularization for Efficiency and Sharing: The paper incorporates additional regularization terms that promote sparsity in layer execution and encourage positive sharing, which ameliorates negative transfer effects and ensures efficiency in parameter utilization.
Curriculum Learning Strategy: AdaShare integrates a curriculum learning-inspired strategy where the decision space is gradually expanded, fostering more stable optimization trajectories and more robust learning of task-specific policies.

Results and Experiments

The paper demonstrates the effectiveness of AdaShare through extensive experiments conducted on several benchmark datasets including NYU v2, CityScapes, and Tiny-Taskonomy. The results affirm that AdaShare significantly outperforms both traditional multi-task learning baselines and state-of-the-art methods such as Cross-Stitch Networks, Sluice Networks, NDDR-CNN, MTAN, and DEN in terms of recognition accuracy, parameter efficiency, and computational costs.

On NYU v2 2-Task and 3-Task setups as well as CityScapes 2-Task Learning, AdaShare consistently achieves superior task performance with considerably reduced parameter counts, outperforming models that rely on hard or soft parameter sharing mechanisms.
With respect to Tiny-Taskonomy 5-Task Learning, AdaShare effectively harnesses task correlations, employing a flexible network that improves learning on this multifaceted dataset which demands the simultaneous assimilation of semantic, 3D, and 2D information.

Implications and Future Directions

Practically, AdaShare facilitates the deployment of multi-task models in resource-constrained environments such as mobile platforms and autonomous systems by enabling efficient resource utilization without compromising task performance. Theoretically, the adaptive, learnable sharing paradigm advances our understanding of efficient model design in the context of multi-task learning, highlighting the importance of task-specific resource allocation.

Looking forward, AdaShare opens up several avenues for future exploration. Extending the framework to include granular, channel-level decisions rather than layer-wise selections may yield even greater efficiencies. Moreover, integrating this approach into broader neural architecture search frameworks could automate the discovery of optimal MTL configurations across diverse task domains, further augmenting the applicability and robustness of multi-task solutions in AI.

In conclusion, AdaShare represents a significant advancement in multi-task learning by offering a robust method to dynamically and efficiently share network components across tasks, optimizing both performance and efficiency in complex, real-world applications.

PDF Markdown

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning (1911.12423v2)

Summary