Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 67 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 94 tok/s Pro
Kimi K2 173 tok/s Pro
GPT OSS 120B 444 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

A Survey of Deep Learning for Scientific Discovery (2003.11755v1)

Published 26 Mar 2020 in cs.LG and stat.ML

Abstract: Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. At the same time, the amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. Taken together, this suggests many exciting opportunities for deep learning applications in scientific settings. But a significant challenge to this is simply knowing where to start. The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models --- two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community. We hope that this survey will help accelerate the use of deep learning across different scientific domains.

Citations (118)

Summary

A Survey of Deep Learning for Scientific Discovery

Overview of the Paper

The paper "A Survey of Deep Learning for Scientific Discovery" takes a comprehensive look at how advances in deep neural networks can be applied across various scientific domains. With the increasing complexity and volume of data being generated in scientific fields, deep learning offers significant potential for breakthroughs in these areas. This survey aims to demystify the application of deep learning in scientific settings by providing a detailed examination of prevalent models, associated tasks, training methods, and techniques to enhance data efficiency and interpretability.

Core Deep Learning Techniques

Deep Learning Workflow

The deep learning workflow typically consists of three key stages: data preparation, learning, and validation (Figure 1). Data preparation involves steps like collection, labeling, and preprocessing. The learning component focuses on selecting appropriate models and tasks, while validation encompasses performance evaluations and analysis of model behavior. Figure 1

Figure 1: Schematic of a typical deep learning workflow.

Models and Tasks for Scientific Applications

The survey highlights various models such as Convolutional Neural Networks (CNNs) for visual data, Transformers for sequential data, and Graph Neural Networks (GNNs) for graph-structured data. Tasks pertinent to these models include image classification, object detection, semantic segmentation, and sequence-to-sequence prediction, offering a spectrum of applications from medical imaging to natural language processing. Figure 2

Figure 2: The Supervised Learning process for training neural networks.

Self-Supervised Learning

Self-supervised learning, which creates automatic labels for data instances, is presented as a pivotal method for leveraging vast unlabelled datasets. This approach is exemplified by techniques such as predicting image rotations or constructing LLMs to improve model robustness and feature extraction capabilities. Figure 3

Figure 3: Training neural networks with Self-Supervision.

Methods to Enhance Data Efficiency

Semi-Supervised and Transfer Learning

Semi-supervised learning harnesses both labeled and unlabelled data to train models more effectively, using techniques such as pseudo-labeling and enforcing consistency on unlabelled inputs. Transfer learning, which involves adapting models pretrained on large, generic datasets to specific tasks

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com