Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 27 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 117 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4 34 tok/s Pro
2000 character limit reached

A primer on model-guided exploration of fitness landscapes for biological sequence design (2010.10614v2)

Published 4 Oct 2020 in q-bio.QM, cs.LG, q-bio.BM, and q-bio.PE

Abstract: Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies remain in communication between these fields which result in biologists finding the progress in machine learning inaccessible, and hinder machine learning scientists from contributing to impactful problems in bioengineering. Sequence design can be seen as a search process on a discrete, high-dimensional space, where each sequence is associated with a function. This sequence-to-function map is known as a "Fitness Landscape". Designing a sequence with a particular function is hence a matter of "discovering" such a (often rare) sequence within this space. Today we can build predictive models with good interpolation ability due to impressive progress in the synthesis and testing of biological sequences in large numbers, which enables model training and validation. However, it often remains a challenge to find useful sequences with the properties that we like using these models. In particular, in this primer we highlight that algorithms for experimental design, what we call "exploration strategies", are a related, yet distinct problem from building good models of sequence-to-function maps. We review advances and insights from current literature -- by no means a complete treatment -- while highlighting desirable features of optimal model-guided exploration, and cover potential pitfalls drawn from our own experience. This primer can serve as a starting point for researchers from different domains that are interested in the problem of searching a sequence space with a model, but are perhaps unaware of approaches that originate outside their field.

Citations (26)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)