A Space-Efficient Approach towards Distantly Homologous Protein Similarity Searches

Published 25 Aug 2015 in cs.CE and q-bio.QM | (1508.06561v1)

Abstract: Protein similarity searches are a routine job for molecular biologists where a query sequence of amino acids needs to be compared and ranked against an ever-growing database of proteins. All available algorithms in this field can be grouped into two categories, either solving the problem using sequence alignment through dynamic programming, or, employing certain heuristic measures to perform an initial screening followed by applying an optimal sequence alignment algorithm to the closest matching candidates. While the first approach suffers from huge time and space demands, the latter approach might miss some protein sequences which are distantly related to the query sequence. In this paper, we propose a heuristic pair-wise sequence alignment algorithm that can be efficiently employed for protein database searches for moderately sized databases. The proposed algorithm is sufficiently fast to be applicable to database searches for short query sequences, has constant auxiliary space requirements, produces good alignments, and is sensitive enough to return even distantly related protein chains that might be of interest.