Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 220 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

A Survey of Deep Learning for Scientific Discovery (2003.11755v1)

Published 26 Mar 2020 in cs.LG and stat.ML

Abstract: Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. At the same time, the amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. Taken together, this suggests many exciting opportunities for deep learning applications in scientific settings. But a significant challenge to this is simply knowing where to start. The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models --- two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community. We hope that this survey will help accelerate the use of deep learning across different scientific domains.

Citations (118)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

A Survey of Deep Learning for Scientific Discovery

Overview of the Paper

The paper "A Survey of Deep Learning for Scientific Discovery" takes a comprehensive look at how advances in deep neural networks can be applied across various scientific domains. With the increasing complexity and volume of data being generated in scientific fields, deep learning offers significant potential for breakthroughs in these areas. This survey aims to demystify the application of deep learning in scientific settings by providing a detailed examination of prevalent models, associated tasks, training methods, and techniques to enhance data efficiency and interpretability.

Core Deep Learning Techniques

Deep Learning Workflow

The deep learning workflow typically consists of three key stages: data preparation, learning, and validation (Figure 1). Data preparation involves steps like collection, labeling, and preprocessing. The learning component focuses on selecting appropriate models and tasks, while validation encompasses performance evaluations and analysis of model behavior. Figure 1

Figure 1: Schematic of a typical deep learning workflow.

Models and Tasks for Scientific Applications

The survey highlights various models such as Convolutional Neural Networks (CNNs) for visual data, Transformers for sequential data, and Graph Neural Networks (GNNs) for graph-structured data. Tasks pertinent to these models include image classification, object detection, semantic segmentation, and sequence-to-sequence prediction, offering a spectrum of applications from medical imaging to natural language processing. Figure 2

Figure 2: The Supervised Learning process for training neural networks.

Self-Supervised Learning

Self-supervised learning, which creates automatic labels for data instances, is presented as a pivotal method for leveraging vast unlabelled datasets. This approach is exemplified by techniques such as predicting image rotations or constructing LLMs to improve model robustness and feature extraction capabilities. Figure 3

Figure 3: Training neural networks with Self-Supervision.

Methods to Enhance Data Efficiency

Semi-Supervised and Transfer Learning

Semi-supervised learning harnesses both labeled and unlabelled data to train models more effectively, using techniques such as pseudo-labeling and enforcing consistency on unlabelled inputs. Transfer learning, which involves adapting models pretrained on large, generic datasets to specific tasks

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com