Human Genome Variation and the concept of Genotype Networks
Giovanni Marco Dall’Olio (1), Jaume Bertranpetit (1), Andreas Wagner (2, 3, 4), Hafid Laayouni (1) ((1) Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, Barcelona, Spain. (2) Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Switzerland. (3) The Swiss Institute of Bioinformatics, Lausanne, Switzerland. (4) The Santa Fe Institute, Santa Fe, USA.)
(Submitted on 3 Sep 2013)
In 1970, John Maynard-Smith introduced the concept of “Protein Space”, a representation of all the possible protein sequences, as a framework to describe how evolutionary processes take place. Since then, the concepts of protein and of networks of sequences have been applied to a variety of systems, from protein modeling to RNA evolution, and to metabolic systems. Here, we adapted these concepts to the analysis of human DNA sequence data. We focused on the variation that can be represented from Single Nucleotide Variants (SNV) data, and we used the 1000 Genomes dataset to determine how human populations have explored this genotype space.
Our results include a genome-wide survey of how the genotype networks of human populations vary along the genome, and a framework to calculate the properties of these networks from sequencing data. Moreover, we found that, in coding regions, these networks tend to be both more “extended” in the space, and also more connected, than in non-coding regions. The application of the concept of genotype networks can provide a new opportunity to understand the evolutionary processes that shaped our genome. If we learn how human populations have explored the genotype space, we can achieve a better understanding of how selective pressures such as pathogens and diseases have shaped the evolution of a region of the genome, and how different regions have evolved. Combined with the availability of larger datasets of sequencing data, genotype networks represent a new approach to the study of human genetic diversity.