Multiple genome rearrangement and breakpoint phylogeny

J Comput Biol. 1998 Fall;5(3):555-70. doi: 10.1089/cmb.1998.5.555.

Abstract

Multiple alignment of macromolecular sequences generalizes from N = 2 to N > or = 3 the comparison of N sequences which have diverged through the local processes of insertion, deletion and substitution. Gene-order sequences diverge through non-local genome rearrangement processes such as inversion (or reversal) and transposition. In this paper we show which formulations of multiple alignment have counterparts in multiple rearrangement. Based on difficulties inherent in rearrangement edit-distance calculation and interpretation, we argue for the simpler "breakpoint analysis." Consensus-based multiple rearrangement of N > or = 3 orders can be solved exactly through reduction to instances of the Travelling Salesman Problem (TSP). We propose a branch-and-bound solution to TSP particularly suited to these instances. Simulations show how non-uniqueness of the solution is attenuated with increasing numbers of data genomes. Tree-based multiple alignment can be achieved to a great degree of accuracy by decomposing the tree into a number of overlapping 3-stars centered on the non-terminal nodes, and solving the consensus-based problem iteratively for these nodes until convergence. Accuracy improves with very careful initializations at the non-terminal nodes. The degree of non-uniqueness of solutions depends on the position of the node in the tree in terms of path length to the terminal vertices.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Gene Rearrangement*
  • Genome
  • Phylogeny*
  • Sequence Alignment*