A Survey of Deep Learning for Scientific Discovery (2003.11755v1)

Published 26 Mar 2020 in cs.LG and stat.ML

Abstract: Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. At the same time, the amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. Taken together, this suggests many exciting opportunities for deep learning applications in scientific settings. But a significant challenge to this is simply knowing where to start. The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models --- two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community. We hope that this survey will help accelerate the use of deep learning across different scientific domains.

Citations (118)

View on Semantic Scholar

Summary

The paper's main contribution is a comprehensive review of deep learning architectures, demonstrating their adaptability to visual, sequential, and graph-structured scientific data.
It highlights the use of self-supervised and semi-supervised learning methods to overcome challenges posed by limited labeled data in various scientific domains.
The survey emphasizes interpretability techniques like representation analysis and feature attribution to elucidate model decision-making in scientific discovery.

A Survey of Deep Learning for Scientific Discovery

This paper provides a comprehensive overview of the diverse landscape of deep learning applications in scientific discovery. It aims to guide researchers through the complexities of deep neural networks and their potential for addressing various scientific problems. The authors, Maithra Raghu and Eric Schmidt, emphasize the dual challenges of dealing with massive data influx across scientific domains and the intricate selection of suitable deep learning models.

The survey meticulously outlines various deep learning architectures, including CNNs, RNNs, and Transformers, alongside their associated tasks like image classification, object detection, sequence-to-sequence mappings, and more. By discussing these architectures, the authors underscore the adaptability of deep learning models to a range of data modalities—visual, sequential, and graph-structured. The paper recognizes the shift towards models that excel with limited data, highlighting methods such as self-supervised and semi-supervised learning as crucial for scientific contexts where labeled data are scarce.

Interpretability is another focal point, as it is crucial for scientific applications where understanding the underlying mechanisms is as important as accurate predictions. The survey provides insights into methodologies like representation analysis and feature attribution that help demystify how deep learning models derive their conclusions.

The paper also ventures into implementation strategies, discussing the deep learning workflow from data collection to model validation and iteration. It serves both as a primer and a detailed guide for researchers keen on leveraging deep learning without prior extensive expertise. By aggregating tutorials, codebases, and pretrained models, the survey facilitates quick ramp-ups in implementing deep learning across diverse scientific areas.

One significant implication of this research is its potential to democratize access to complex deep learning methodologies for scientific purposes. As deep learning models become more integral in tackling scientific questions, this survey could be pivotal in broadening their adoption and application across domains like bioinformatics, chemistry, physics, and medical science. Looking forward, the exploration of even more efficient architectures, especially those that leverage unsupervised data, will be vital in pushing the boundaries of scientific discovery.

In summary, this survey offers a robust foundation for understanding and applying deep learning techniques in scientific research. By doing so, it sets the stage for future advancements and the integration of AI in solving complex scientific challenges.

PDF Markdown

Related Papers

GitHub

GitHub - makcedward/nlpaug: Data augmentation for NLP (4,333 stars)