Multiple sequence alignment for short sequences

Kristóf Takács

Multiple sequence alignment (MSA) has been one of the most important problems in bioinformatics for more decades and it is still heavily examined by many mathematicians and biologists. However, mostly because of the practical motivation of this problem, the research on this topic is focused on aligning long sequences. It is understandable, since the sequences that need to be aligned (usually DNA or protein sequences) are generally quite long (e. g., at least 30-40 characters). Nevertheless, it is a challenging question that exactly where MSA starts to become a real hard problem (since it is known that MSA is NP-complete [2]), and the key to answer this question is to examine short sequences. If the optimal alignment for short sequences could be determined in polynomial time, then these results may help to develop faster or more accurate heuristic algorithms for aligning long sequences. In this work, it is shown that for length-1 sequences using arbitrary metric, as well as for length-2 sequences using unit metric, the optimum of the MSA problem can be achieved by the trivial alignment.