Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

More Synergy, Less Redundancy: Exploiting Joint Mutual Information for Self-Supervised Learning (2307.00651v1)

Published 2 Jul 2023 in cs.CV

Abstract: Self-supervised learning (SSL) is now a serious competitor for supervised learning, even though it does not require data annotation. Several baselines have attempted to make SSL models exploit information about data distribution, and less dependent on the augmentation effect. However, there is no clear consensus on whether maximizing or minimizing the mutual information between representations of augmentation views practically contribute to improvement or degradation in performance of SSL models. This paper is a fundamental work where, we investigate role of mutual information in SSL, and reformulate the problem of SSL in the context of a new perspective on mutual information. To this end, we consider joint mutual information from the perspective of partial information decomposition (PID) as a key step in \textbf{reliable multivariate information measurement}. PID enables us to decompose joint mutual information into three important components, namely, unique information, redundant information and synergistic information. Our framework aims for minimizing the redundant information between views and the desired target representation while maximizing the synergistic information at the same time. Our experiments lead to a re-calibration of two redundancy reduction baselines, and a proposal for a new SSL training protocol. Extensive experimental results on multiple datasets and two downstream tasks show the effectiveness of this framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. “Self-supervised visual feature learning with deep neural networks: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 11, pp. 4037–4058, 2020.
  2. “Deep learning for computer vision: A brief review,” Computational intelligence and neuroscience, vol. 2018, 2018.
  3. “Deep gan-based cross-spectral cross-resolution iris recognition,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 3, no. 4, pp. 443–463, 2021.
  4. “Human age estimation from gene expression data using artificial neural networks,” in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2021, pp. 3492–3497.
  5. “Deep active ensemble sampling for image classification,” in Proceedings of the Asian Conference on Computer Vision, 2022, pp. 4531–4547.
  6. “Deep bayesian active learning, a brief survey on recent advances,” arXiv preprint arXiv:2012.08044, 2020.
  7. “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
  8. “Fussl: Fuzzy uncertain self supervised learning,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2799–2808.
  9. “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  10. “Contrastive multiview coding,” in European conference on computer vision. Springer, 2020, pp. 776–794.
  11. “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.
  12. “Learning representations by maximizing mutual information across views,” Advances in neural information processing systems, vol. 32, 2019.
  13. “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in neural information processing systems, vol. 33, pp. 21271–21284, 2020.
  14. “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15750–15758.
  15. “Unsupervised learning of visual features by contrasting cluster assignments,” Advances in Neural Information Processing Systems, vol. 33, pp. 9912–9924, 2020.
  16. “Barlow twins: Self-supervised learning via redundancy reduction,” in International Conference on Machine Learning. PMLR, 2021, pp. 12310–12320.
  17. “Whitening for self-supervised representation learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 3015–3024.
  18. “On feature decorrelation in self-supervised learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9598–9608.
  19. “The mechanism of prediction head in non-contrastive self-supervised learning,” arXiv preprint arXiv:2205.06226, 2022.
  20. “Tighter variational bounds are not necessarily better,” in International Conference on Machine Learning. PMLR, 2018, pp. 4277–4285.
  21. “On mutual information maximization for representation learning,” arXiv preprint arXiv:1907.13625, 2019.
  22. “What makes for good views for contrastive learning?,” Advances in Neural Information Processing Systems, vol. 33, pp. 6827–6839, 2020.
  23. “Bits and pieces: Understanding information decomposition from part-whole relationships and formal logic,” Proceedings of the Royal Society A, vol. 477, no. 2251, pp. 20210110, 2021.
  24. “Nonnegative decomposition of multivariate information,” arXiv preprint arXiv:1004.2515, 2010.
  25. “Decomposed mutual information estimation for contrastive representation learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 9859–9869.
  26. Elements of information theory (2. ed.), Wiley, 2006.
  27. “Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective,” Journal of computational neuroscience, vol. 36, no. 2, pp. 119–140, 2014.
  28. “Cholesky factorization of matrices in parallel and ranking of graphs,” in International conference on parallel processing and applied mathematics. Springer, 2003, pp. 985–992.
  29. “Whitening and coloring batch transform for gans,” arXiv preprint arXiv:1806.00420, 2018.
  30. “Directional self-supervised learning for heavy image augmentations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16692–16701.
  31. “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
  32. “Learning multiple layers of features from tiny images,” 2009.
  33. Ya Le and Xuan Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, pp. 3, 2015.
  34. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  35. “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  36. “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Citations (8)

Summary

We haven't generated a summary for this paper yet.