Mapping individual gene data on an evolutionary tree

Mapping individual gene data on an evolutionary tree

B. Mirkin, T.I. Fenner, G. Loizou

________________________________________________________________________

Project outline and aims

Evolutionary trees are an important instrument in inter-genome analysis. Traditionally, computational biology focuses on the problem of tree building. This problem can be formulated as follows: given some data on a set of extant species, build a (rooted) tree whose leaves correspond to the extant species and interior nodes to their ancestors, in such a way that more similar species get later divergence events leading to them. This project is devoted to a related problem - developing methods of interpretation of various types of data on the extant species by mapping them in a biologically meaningful way onto an evolutionary tree and annotating the tree nodes with relevant evolutionary events.

In particular, we are concerned with three specific projects:

With I. Muchnik (Rutgers NJ USA) and M. Vingron (currently Berlin Germany), we develop mathematical models and algorithms addressing the following problem. Given an evolutionary species tree and a set of trees built on the same extant species according to similarity between individual gene families, find a mapping of the individual gene trees onto the species tree exhibiting gene duplications and losses to account for the differences. We have developed a so-called annotating model for comparing gene and species trees and established its relations with two other existing models: reconciled tree and lca mapping, see

O. Eulenstein, B. Mirkin, and M. Vingron (1997) Comparison of annotating duplication, tree mapping, and copying as methods to compare gene trees with species trees, in B. Mirkin, F. McMorris, F. Roberts, and A. Rzhetsky (Eds.) Mathematical Hierarchies and Biology, DIMACS Series, V. 37, Providence: AMS, 71-94.

O. Eulenstein, B. Mirkin, and M. Vingron (1998) Duplication-based measures of difference between gene and species trees, Journal of Computational Biology, 5, 135-148.

B. Mirkin (2004) Mapping gene family data onto evolutionary trees, in M. Chavent, O. Dordan, C. Lacomblez, M. Langlais, and B. Patouille (Eds.), Comptes rendus des 11es Rencontres de la Societe Francophone de Classification, University of Bordeaux, 61-68.

With E. Koonin and Y. Wolf (NCBI Bethesda USA), we develop algorithms and run computations addressing the following problem. Given an evolutionary species tree and a patterns of presence/absence of a number of genes in the extant species, find hypothetical evolutionary scenarios explaining the patterns by phenomena of gene emergence, horizontal transfer, and gene loss at various extant and ancestor tree nodes. With an algorithm developed for finding maximally parsimonious scenarios, we applied it to about 3000 COG phylogenetic patterns on a set of 26 species with different gene gain penalty weights. The last ultimate common ancestor, LUCA, corresponding to equal loss and gain penalty weights counted 572 genes and appeared most compatible with the hypothetical real ancestor, which led us to suggest that the horizontal transfer was as frequent as the loss. Now this approach is being extended to handle both the maximum likelihood criterion and usage of information on similarities between proteins representing the same gene. This will help in better reconstructing genome contents of ancestor species as well as delineating gene histories, see

B. Mirkin, T. Fenner, M. Galperin and E. Koonin (2003) Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes, BMC Evolutionary Biology 2003, 3:2.

B. Mirkin and E. Koonin (2003) A top-down method for building genome classification trees with linear binary hierarchies, in M. Janowitz, J.-F. Lapointe, F. McMorris, B. Mirkin, and F. Roberts (Eds.) Bioconsensus, DIMACS Series, V. 61, Providence: AMS, 97-112.

K.S. Makarova, Y.I. Wolf, S.L. Mekhedov, B. Mirkin and E.V. Koonin (2005) Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell, Nucleic Acids Research, 2005, Vol. 33, No. 14, 4626-4638.