Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 52 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition (2405.03712v1)

Published 4 May 2024 in cs.LG, cs.AI, cs.NE, and cs.CR

Abstract: In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830, 2014.
  2. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  3. L* relu: piece-wise linear activation functions for deep fine-grained visual categorization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1218–1227, 2020.
  4. Clip: Cheap lipschitz training of neural networks. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 307–319. Springer, 2021.
  5. Fast and accurate deep network learning by exponential linear units (elus). arxiv 2015. arXiv preprint arXiv:1511.07289, 2020.
  6. P Kingma Diederik. Adam: A method for stochastic optimization. (No Title), 2014.
  7. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing, 503:92–108, 2022.
  8. Is it time to swish? comparing deep learning activation functions across nlp tasks. arXiv preprint arXiv:1901.02671, 2019.
  9. Fast image restoration with multi-bin trainable linear units. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4190–4199, 2019.
  10. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
  11. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016.
  12. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  13. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
  14. Delve into activations: Towards understanding dying neuron. IEEE Transactions on Artificial Intelligence, 2022.
  15. Self-normalizing neural networks. Advances in neural information processing systems, 30, 2017.
  16. Adaptive batch normalization for practical domain adaptation. Pattern Recognition, 80:109–117, 2018.
  17. Natural-logarithm-rectified activation function in convolutional neural networks. In 2019 IEEE 5th International Conference on Computer and Communications (ICCC), pages 2000–2008. IEEE, 2019.
  18. Adaptive gradient methods with dynamic bound of learning rate. arXiv preprint arXiv:1902.09843, 2019.
  19. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318. Pmlr, 2013.
  20. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
  21. An analysis of convolutional neural networks for image classification. Procedia computer science, 132:377–384, 2018.
  22. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
  23. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  24. On the importance of initialization and momentum in deep learning. In International conference on machine learning, pages 1139–1147. PMLR, 2013.
  25. Parametric exponential linear unit for deep convolutional neural networks. In 2017 16th IEEE international conference on machine learning and applications (ICMLA), pages 207–214. IEEE, 2017.
  26. Hyperactivations for activation function exploration. In 31st Conference on Neural Information Processing Systems (NIPS 2017), Workshop on Meta-learning. Long Beach, USA, 2017.
  27. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  28. Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Advances in neural information processing systems, 33:18795–18806, 2020.
  29. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com