Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transferability-Guided Cross-Domain Cross-Task Transfer Learning (2207.05510v2)

Published 12 Jul 2022 in cs.CV, cs.AI, and cs.LG

Abstract: We propose two novel transferability metrics F-OTCE (Fast Optimal Transport based Conditional Entropy) and JC-OTCE (Joint Correspondence OTCE) to evaluate how much the source model (task) can benefit the learning of the target task and to learn more transferable representations for cross-domain cross-task transfer learning. Unlike the existing metric that requires evaluating the empirical transferability on auxiliary tasks, our metrics are auxiliary-free such that they can be computed much more efficiently. Specifically, F-OTCE estimates transferability by first solving an Optimal Transport (OT) problem between source and target distributions, and then uses the optimal coupling to compute the Negative Conditional Entropy between source and target labels. It can also serve as a loss function to maximize the transferability of the source model before finetuning on the target task. Meanwhile, JC-OTCE improves the transferability robustness of F-OTCE by including label distances in the OT problem, though it may incur additional computation cost. Extensive experiments demonstrate that F-OTCE and JC-OTCE outperform state-of-the-art auxiliary-free metrics by 18.85% and 28.88%, respectively in correlation coefficient with the ground-truth transfer accuracy. By eliminating the training cost of auxiliary tasks, the two metrics reduces the total computation time of the previous method from 43 minutes to 9.32s and 10.78s, respectively, for a pair of tasks. When used as a loss function, F-OTCE shows consistent improvements on the transfer accuracy of the source model in few-shot classification experiments, with up to 4.41% accuracy gain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. L. Y. Pratt, “Discriminability-based transfer between neural networks,” in Advances in neural information processing systems, 1993, pp. 204–211.
  2. Q. Sun, Y. Liu, T.-S. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, pp. 403–412.
  3. L. Shao, F. Zhu, and X. Li, “Transfer learning for visual categorization: A survey,” IEEE transactions on neural networks and learning systems, vol. 26, no. 5, pp. 1019–1034, 2014.
  4. L. Zhang and X. Gao, “Transfer adaptation learning: A decade survey,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  5. W. Zhang, L. Deng, and D. Wu, “Overcoming negative transfer: A survey,” arXiv preprint arXiv:2009.00909, 2020.
  6. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009.
  7. Y. Bao, Y. Li, S.-L. Huang, L. Zhang, L. Zheng, A. Zamir, and L. Guibas, “An information-theoretic approach to transferability in task transfer learning,” in 2019 IEEE International Conference on Image Processing (ICIP).   IEEE, 2019, pp. 2309–2313.
  8. A. Maurer, “Transfer bounds for linear feature learning,” Machine learning, vol. 75, no. 3, pp. 327–350, 2009.
  9. S. Ben-David and R. Schuller, “Exploiting task relatedness for multiple task learning,” in Learning Theory and Kernel Machines.   Springer, 2003, pp. 567–580.
  10. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine learning, vol. 79, no. 1-2, pp. 151–175, 2010.
  11. S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” Advances in neural information processing systems, vol. 19, pp. 137–144, 2006.
  12. J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman, “Learning bounds for domain adaptation,” in Advances in neural information processing systems, 2008, pp. 129–136.
  13. Y. Mansour, M. Mohri, and A. Rostamizadeh, “Domain adaptation: Learning bounds and algorithms,” arXiv preprint arXiv:0902.3430, 2009.
  14. W. Wang, H. Li, Z. Ding, F. Nie, J. Chen, X. Dong, and Z. Wang, “Rethinking maximum mean discrepancy for visual domain adaptation,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  15. Z. Wang, B. Du, and Y. Guo, “Domain adaptation with neural embedding matching,” IEEE transactions on neural networks and learning systems, vol. 31, no. 7, pp. 2387–2397, 2019.
  16. A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese, “Taskonomy: Disentangling task transfer learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3712–3722.
  17. J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in neural information processing systems, 2014, pp. 3320–3328.
  18. A. Achille, M. Lam, R. Tewari, A. Ravichandran, S. Maji, C. C. Fowlkes, S. Soatto, and P. Perona, “Task2vec: Task embedding for meta-learning,” in Proceedings of the IEEE international conference on computer vision, 2019, pp. 6430–6439.
  19. W. Ying, Y. Zhang, J. Huang, and Q. Yang, “Transfer learning via learning to transfer,” in International Conference on Machine Learning, 2018, pp. 5085–5094.
  20. A. T. Tran, C. V. Nguyen, and T. Hassner, “Transferability and hardness of supervised classification tasks,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1395–1405.
  21. C. V. Nguyen, T. Hassner, C. Archambeau, and M. Seeger, “Leep: A new measure to evaluate transferability of learned representations,” in International Conference on Machine Learning, 2020.
  22. Y. Tan, Y. Li, and S.-L. Huang, “Otce: A transferability metric for cross-domain cross-task representations,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), June 2021, pp. 15 779–15 788.
  23. K. You, Y. Liu, J. Wang, and M. Long, “Logme: Practical assessment of pre-trained models for transfer learning,” in International Conference on Machine Learning.   PMLR, 2021, pp. 12 133–12 143.
  24. J. Huang, N. Xiao, and L. Zhang, “Balancing transferability and discriminability for unsupervised domain adaptation,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  25. L. Kantorovich, “On the translocation of masses, cr (dokl.) acad,” Sci. URSS (NS), vol. 37, p. 199, 1942.
  26. G. Peyré, M. Cuturi et al., “Computational optimal transport: With applications to data science,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
  27. W.-H. Li, X. Liu, and H. Bilen, “Universal representation learning from multiple domains for few-shot classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9526–9535.
  28. M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Advances in neural information processing systems, 2013, pp. 2292–2300.
  29. D. Alvarez-Melis and N. Fusi, “Geometric dataset distances via optimal transport,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 21 428–21 439.
  30. A. Genevay, G. Peyré, and M. Cuturi, “Learning generative models with sinkhorn divergences,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2018, pp. 1608–1617.
  31. W.-H. Li, X. Liu, and H. Bilen, “Universal representations: A unified look at multiple task and domain learning,” arXiv preprint arXiv:2204.02744, 2022.
  32. S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representations revisited,” in International Conference on Machine Learning.   PMLR, 2019, pp. 3519–3529.
  33. X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1406–1415.
  34. K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in European conference on computer vision.   Springer, 2010, pp. 213–226.
  35. M. G. Kendall, “A new measure of rank correlation,” Biometrika, vol. 30, no. 1/2, pp. 81–93, 1938.
  36. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  37. L. Chizat, P. Roussillon, F. Léger, F.-X. Vialard, and G. Peyré, “Faster wasserstein distance estimation with the sinkhorn divergence,” in Neural Information Processing Systems, 2020.
  38. X. Zhai, J. Puigcerver, A. Kolesnikov, P. Ruyssen, C. Riquelme, M. Lucic, J. Djolonga, A. S. Pinto, M. Neumann, A. Dosovitskiy et al., “A large-scale study of representation learning with the visual task adaptation benchmark,” arXiv preprint arXiv:1910.04867, 2019.
  39. X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self-supervised semi-supervised learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1476–1485.
  40. S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in International Conference on Learning Representations, 2018.
  41. M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in European Conference on Computer Vision, 2016.
  42. A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” International Conference on Learning Representations, 2019.
  43. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  44. L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 4, pp. 594–611, 2006.
  45. A. Krizhevsky, “Learning multiple layers of features from tiny images,” Master’s thesis, University of Tront, 2009.
  46. M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, , and A. Vedaldi, “Describing textures in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  47. M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in Indian Conference on Computer Vision, Graphics and Image Processing, Dec 2008.
  48. O. M. Parkhi, A. Vedaldi, A. Zisserman, and C. V. Jawahar, “Cats and dogs,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012.
  49. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
  50. B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, and M. Welling, “Rotation equivariant cnns for digital pathology,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018.
  51. S. Hou, X. Liu, and Z. Wang, “Dualnet: Learn complementary features for image recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 502–510.
  52. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International Conference on Machine Learning.   PMLR, 2017, pp. 1126–1135.
  53. O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., “Matching networks for one shot learning,” Advances in neural information processing systems, vol. 29, pp. 3630–3638, 2016.
  54. J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in Neural Information Processing Systems, vol. 30, pp. 4077–4087, 2017.
  55. F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1199–1208.
  56. B. Lake, R. Salakhutdinov, J. Gross, and J. Tenenbaum, “One shot learning of simple visual concepts,” in Proceedings of the annual meeting of the cognitive science society, vol. 33, no. 33, 2011.
  57. Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, 2010.
  58. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  59. E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, U. Evci, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P.-A. Manzagol et al., “Meta-dataset: A dataset of datasets for learning to learn from few examples,” arXiv preprint arXiv:1903.03096, 2019.
  60. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.
  61. S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” arXiv preprint arXiv:1306.5151, 2013.
  62. S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, “Detection of traffic signs in real-world images: The german traffic sign detection benchmark,” in The 2013 international joint conference on neural networks (IJCNN).   Ieee, 2013, pp. 1–8.
  63. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
Citations (3)

Summary

We haven't generated a summary for this paper yet.