Cross-stitch Networks for Multi-task Learning (1604.03539v1)

Published 12 Apr 2016 in cs.CV and cs.LG

Abstract: Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning. Specifically, we propose a new sharing unit: "cross-stitch" unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.

Citations (1,270)

View on Semantic Scholar

Summary

The paper introduces cross-stitch units that blend activations to effectively share representations across multiple tasks.
It presents a straightforward, end-to-end training approach that outperforms traditional single-task and ensemble methods.
Experiments on NYU-v2 and PASCAL VOC demonstrate notable gains in segmentation, detection, and attribute prediction tasks.

Cross-stitch Networks for Multi-task Learning: An Overview

The paper "Cross-stitch Networks for Multi-task Learning" by Ishan Misra et al. presents an innovative approach to multi-task learning in Convolutional Networks (ConvNets). The authors introduce a new unit—referred to as the cross-stitch unit—which facilitates the learning of shared representations across multiple tasks. This method addresses the limitations of existing multi-task learning approaches, particularly the dependency on task-specific network architectures that do not generalize well.

Introduction and Motivation

ConvNets have significantly improved performance in various recognition tasks, such as classification, detection, and segmentation. The shared representations learned by these networks have been instrumental in their success. Extending this concept to multi-task learning, where multiple tasks share supervisory signals, promises further performance gains. However, existing techniques often rely on task-specific network architectures, leading to a lack of generalization and a cumbersome process of architecture selection.

Methodology: Cross-stitch Units

To overcome these challenges, the paper proposes the use of cross-stitch units. These units combine activations from multiple networks, allowing the model to learn an optimal blend of shared and task-specific representations. The cross-stitch unit achieves this by modeling shared representations as linear combinations of input activation maps from different tasks. These combinations are parameterized by weights that can be learned during the training process.

The cross-stitch units are incorporated into the network at various layers, and the entire model is trained end-to-end. This approach simplifies the process of multi-task learning by eliminating the need to manually design and test numerous task-specific network architectures.

Experimental Setup and Results

The authors conduct extensive experiments to evaluate the effectiveness of cross-stitch units. They focus on two main pairs of tasks: semantic segmentation and surface normal prediction on the NYU-v2 dataset, and object detection and attribute prediction on the PASCAL VOC 2008 dataset. The models are benchmarked against strong baselines, including single-task networks, ensemble methods, and the best performing split architecture found through brute-force enumeration.

Semantic Segmentation and Surface Normal Prediction

For the tasks of semantic segmentation and surface normal prediction, the cross-stitch network outperforms the baseline single-task and ensemble networks. The cross-stitched model also surpasses the best-performing split architecture, demonstrating its ability to adaptively learn the optimal level of shared and task-specific representations.

Object Detection and Attribute Prediction

In the case of object detection and attribute prediction, the cross-stitch network shows significant improvement in attribute prediction performance, particularly for data-starved categories. This suggests that cross-stitch units efficiently leverage shared representations to boost performance in tasks with limited training data.

Discussion and Implications

The paper's results indicate that cross-stitch units effectively generalize across different types of tasks, providing a robust and scalable solution for multi-task learning in ConvNets. The approach also alleviates the burden of architecture selection, making it a practical choice for a wide range of applications.

Future Work

Future research could explore the properties of cross-stitch units further, such as determining the optimal layers at which to introduce these units and exploring different strategies for constraining their weights. Additionally, extending this methodology to tasks involving multiple input modalities, such as combining image and depth data, presents an interesting avenue for further investigation.

Conclusion

The cross-stitch network method provides a principled and flexible approach to multi-task learning in ConvNets. By dynamically learning the best combination of shared and task-specific representations, it achieves superior performance across various tasks and datasets. This method stands out for its ability to generalize across different tasks without the need for task-specific architecture tuning, thus representing a significant advancement in the domain of multi-task learning.

PDF Markdown