Designing RNA Secondary Structures is Hard (1710.11513v2)

Published 31 Oct 2017 in cs.DS, math.CO, q-bio.BM, and q-bio.QM

Abstract: An RNA sequence is a word over an alphabet on four elements ${A,C,G,U}$ called bases. RNA sequences fold into secondary structures where some bases match one another while others remain unpaired. Pseudoknot-free secondary structures can be represented as well-parenthesized expressions with additional dots, where pairs of matching parentheses symbolize paired bases and dots, unpaired bases. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some model of energy and to design sequences of bases which will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS 2016). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Anne Condon (ICALP 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this paper we show that, in the simplest model of energy which is the Watson-Crick model the design of secondary structures is NP-complete if one adds natural constraints of the form: index $i$ of the sequence has to be labeled by base $b$. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice.

Citations (304)

View on Semantic Scholar

Summary

The paper shows that RNA secondary structure design is NP-complete through a reduction from E3-SAT, emphasizing inherent algorithmic challenges.
It uses variable and clause gadgets with carefully designed parenthesis arches to enforce a strict linear order in sequence construction.
The findings underscore the need for heuristic and efficient computational methods in RNA design for complex biological applications.

Complexity of RNA Secondary Structure Design

Introduction

The paper "Designing RNA Secondary Structures is Hard" by Édouard Bonnet, Paweł Rzążewski, and Florian Sikora addresses the computational complexity associated with designing RNA secondary structures, particularly within the simplest model of energy constraints known as the Watson-Crick model. This examination into RNA structure design highlights the difficulty of determining whether a given RNA sequence will fold into a specified secondary structure, a question that has remained open for decades and is pivotal within computational biology and bioinformatics.

RNA Secondary Structures and Computational Challenges

RNA molecules are essential nucleic acids, composed of nucleotides that form complex secondary structures by folding back on themselves. These structures can be represented using well-parenthesized expressions, and understanding their folding patterns is crucial for deducing their biological functions. The task of predicting RNA folding into pseudoknot-free structures, which can be efficiently solved using dynamic programming, is contrasted with the inverse folding problem, where a sequence needs to be constructed to yield a predefined secondary structure.

Computational Complexity and NP-Hardness

The paper demonstrates that the problem of RNA secondary structure design, even in its simplified version with additional constraints in the Watson-Crick model, is NP-complete. This result was achieved through a reduction from a variant of the classical NP-hard problem, E3-SAT, utilizing a construction of variable and clause gadgets that interleave in a manner similar to the 3-SAT clauses.

Key to the proofs are the introduction of arches of parentheses of various widths and a linear order that integrates variables with clauses, ensuring that any sequence design leading to desired structures avoids alternative unwanted formations. The NP-hardness persists even when considering natural constraints found in practical RNA design software, such as those used in the EteRNA project, which reflects real-world applications and constraints.

Implications and Future Directions

The NP-completeness result suggests that as more realistic energy models are considered, the task would only become harder, further emphasizing the need to refine algorithms for RNA design. Current solutions often include heuristic or exponential-time algorithms, limiting the feasibility for larger molecules or more complex structures.

The authors also propose potential directions for future work, including exploring the complexity of designs without constraints or in more advanced energy models. Additionally, the implications of this paper stretch to fields like pharmaceutical research, synthetic biology, and the development of RNA nanostructures, further cementing the need for efficient computational strategies.

Conclusion

This work places the RNA secondary structure design problem among the recognized NP-complete problems, adding a significant insight into the complexity of RNA bioinformatics tasks. The paper lays a rigorous foundation for future explorations in this domain, highlighting the blending of computational theory and biological necessity. Researchers in the fields of computational biology and computer science can build upon these findings to advance both theoretical algorithms and practical applications in RNA design.

PDF Markdown