The missing heritability revealed in Arabidopsis thaliana

The missing heritability revealed in Arabidopsis thaliana
Xia Shen
(Submitted on 30 Jul 2013)

Although high-throughput genomic data are widely available, a large proportion of the narrow sense heritability of many complex traits have not been successfully uncovered. In this study, focusing on phenotype prediction, I show that by properly selecting a small number of loci, a significant amount of missing heritability can be revealed. The results provide new insights into the missing heritability problem and the underlying genetic architecture of complex traits.


The Population Genetic Signature of Polygenic Local Adaptation

The Population Genetic Signature of Polygenic Local Adaptation
Jeremy J. Berg, Graham Coop
(Submitted on 29 Jul 2013)

Adaptation in response to selection on polygenic phenotypes occurs via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that have undergone local adaptation. Using GWAS data, we estimate the mean additive genetic value for a give phenotype across many populations as simple weighted sums of allele frequencies. We model the expected differentiation of GWAS loci among populations under neutrality to develop simple tests of selection across an arbitrary number of populations with arbitrary population structure. To find support for the role of specific environmental variables in local adaptation we test for correlations with the estimated genetic values. We also develop a general test of local adaptation to identify overdispersion of the estimated genetic values values among populations. This test is a natural generalization of QST /FST comparisons based on GWAS predictions. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles. We apply our tests to the human genome diversity panel dataset using GWAS data for six different traits. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.

A path integral formulation of the Wright-Fisher process with genic selection

A path integral formulation of the Wright-Fisher process with genic selection
Joshua G. Schraiber
(Submitted on 29 Jul 2013)

The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a simple perturbation series to approximate the transition density. The perturbation series can be understood in terms of Feynman diagrams, which have a simple probabilistic interpretation in terms of selective events. The perturbation series proves to be an accurate approximation of the transition density for weak selection and is shown to be arbitrarily accurate for any selection coefficient.

Ancient west Eurasian ancestry in southern and eastern Africa

Ancient west Eurasian ancestry in southern and eastern Africa
Joseph K. Pickrell, Nick Patterson, Po-Ru Loh, Mark Lipson, Bonnie Berger, Mark Stoneking, Brigitte Pakendorf, David Reich
(Submitted on 30 Jul 2013)

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to approximately 900-1,800 years ago, and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to approximately 2,700 – 3,300 years ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa, and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.

Characterizing Compatibility and Agreement of Unrooted Trees via Cuts in Graphs

Characterizing Compatibility and Agreement of Unrooted Trees via Cuts in Graphs
Sudheer Vakati, David Fernández-Baca
(Submitted on 30 Jul 2013)

Deciding whether there is a single tree -a supertree- that summarizes the evolutionary information in a collection of unrooted trees is a fundamental problem in phylogenetics. We consider two versions of this question: agreement and compatibility. In the first, the supertree is required to reflect precisely the relationships among the species exhibited by the input trees. In the second, the supertree can be more refined than the input trees.
Tree compatibility can be characterized in terms of the existence of a specific kind of triangulation in a structure known as the display graph. Alternatively, it can be characterized as a chordal graph sandwich problem in a structure known as the edge label intersection graph. Here, we show that the latter characterization yields a natural characterization of compatibility in terms of minimal cuts in the display graph, which is closely related to compatibility of splits. We then derive a characterization for agreement.

Sibelia: A scalable and comprehensive synteny block generation tool for closely related microbial genomes

Sibelia: A scalable and comprehensive synteny block generation tool for closely related microbial genomes
Ilya Minkin, Anand Patel, Mikhail Kolmogorov, Nikolay Vyahhi, Son Pham
(Submitted on 30 Jul 2013)

Comparing strains within the same microbial species has proven effective in the identification of genes and genomic regions responsible for virulence, as well as in the diagnosis and treatment of infectious diseases. In this paper, we present Sibelia, a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia can find synteny blocks that are repeated within genomes as well as blocks shared by multiple genomes. It represents synteny blocks in a hierarchy structure with multiple layers, each of which representing a different granularity level. Sibelia has been designed to work efficiently with a large number of microbial genomes; it finds synteny blocks in 31 S. aureus genomes within 31 minutes and in 59 E.coli genomes within 107 minutes on a standard desktop. Sibelia software is distributed under the GNU GPL v2 license and is available at: this https URL Sibelia’s web-server is available at: this http URL

Exploring Genome Characteristics and Sequence Quality Without a Reference

Exploring Genome Characteristics and Sequence Quality Without a Reference
Jared T. Simpson
(Submitted on 30 Jul 2013)

The de novo assembly of large, complex genomes is a significant challenge with currently available DNA sequencing technology. While many de novo assembly software packages are available, comparatively little attention has been paid to assisting the user with the assembly. This paper addresses the practical aspects of de novo assembly by introducing new ways to perform quality assessment on a collection of DNA sequence reads. The software implementation calculates per-base error rates, paired-end fragment size histograms and coverage metrics in the absence of a reference genome. Additionally, the software will estimate characteristics of the sequenced genome, such as repeat content and heterozygosity, that are key determinants of assembly difficulty. The software described is freely available and open source under the GNU Public License.

Comprehensive analysis of imprinted genes in maize reveals limited conservation with other species and allelic variation for imprinting

Comprehensive analysis of imprinted genes in maize reveals limited conservation with other species and allelic variation for imprinting
Amanda J. Waters, Paul Bilinski, Steve R. Eichten, Matthew W. Vaughn, Jeffrey Ross-Ibarra, Mary Gehring, Nathan M. Springer
(Submitted on 29 Jul 2013)

In plants, a subset of genes exhibit imprinting in endosperm tissue such that expression is primarily from the maternal or paternal allele. Imprinting may arise as a consequence of mechanisms for silencing of transposons during reproduction, and in some cases imprinted expression of particular genes may provide a selective advantage such that it is conserved across species. Separate mechanisms for the origin of imprinted expression patterns and maintenance of these patterns may result in substantial variation in the targets of imprinting in different species. Here we present deep sequencing of RNAs isolated from reciprocal crosses of four diverse maize genotypes, providing a comprehensive analysis of imprinting in maize that allows evaluation of imprinting at more than 95% of endosperm-expressed genes. We find that over 500 genes exhibit statistically significant parent-of-origin effects in maize endosperm tissue, but focused our analyses on a subset of these genes that had >90% expression from the maternal allele (69 genes) or from the paternal allele (108 genes) in at least one reciprocal cross. Over 10% of imprinted genes show evidence of allelic variation for imprinting. A comparison of imprinting in maize and rice reveals that only 13% of genes with syntenic orthologs in both species exhibit conserved imprinting. Genes that exhibit conserved imprinting in maize relative to rice have elevated dN/dS ratios compared to other imprinted genes, suggesting a history of more rapid evolution. Together, these data suggest that imprinting only has functional relevance at a subset of loci that currently exhibit imprinting in maize.

The genome of the medieval Black Death agent

The genome of the medieval Black Death agent (extended abstract)
Ashok Rajaraman, Eric Tannier, Cedric Chauve
(Submitted on 29 Jul 2013)

The genome of a 650 year old Yersinia pestis bacteria, responsible for the medieval Black Death, was recently sequenced and assembled into 2,105 contigs from the main chromosome. According to the point mutation record, the medieval bacteria could be an ancestor of most Yersinia pestis extant species, which opens the way to reconstructing the organization of these contigs using a comparative approach. We show that recent computational paleogenomics methods, aiming at reconstructing the organization of ancestral genomes from the comparison of extant genomes, can be used to correct, order and complete the contig set of the Black Death agent genome, providing a full chromosome sequence, at the nucleotide scale, of this ancient bacteria. This sequence suggests that a burst of mobile elements insertions predated the Black Death, leading to an exceptional genome plasticity and increase in rearrangement rate.

The genomic impacts of drift and selection for hybrid performance in maize

The genomic impacts of drift and selection for hybrid performance in maize
Justin P. Gerke, Jode W. Edwards, Katherine E. Guill, Jeffrey Ross-Ibarra, Michael D. McMullen
(Submitted on 27 Jul 2013)

Modern maize breeding relies upon selection in inbreeding populations to improve the performance of cross-population hybrids. The United States Department of Agriculture – Agricultural Research Service reciprocal recurrent selection experiment between the Iowa Stiff Stalk Synthetic (BSSS) and the Iowa Corn Borer Synthetic No. 1 (BSCB1) populations represents one of the longest standing models of selection for hybrid performance. To investigate the genomic impact of this selection program, we used the Illumina MaizeSNP50 high-density SNP array to determine genotypes of progenitor lines and over 600 individuals across multiple cycles of selection. Consistent with previous research (Messmer et al., 1991; Labate et al., 1997; Hagdorn et al., 2003; Hinze et al., 2005), we found that genetic diversity within each population steadily decreases, with a corresponding increase in population structure. High marker density also enabled the first view of haplotype ancestry, fixation and recombination within this historic maize experiment. Extensive regions of haplotype fixation within each population are visible in the pericentromeric regions, where large blocks trace back to single founder inbreds. Simulation attributes most of the observed reduction in genetic diversity to genetic drift. Signatures of selection were difficult to observe in the background of this strong genetic drift, but heterozygosity in each population has fallen more than expected. Regions of haplotype fixation represent the most likely targets of selection, but as observed in other germplasm selected for hybrid performance (Feng et al., 2006), there is no overlap between the most likely targets of selection in the two populations. We discuss how this pattern is likely to occur during selection for hybrid performance, and how it poses challenges for dissecting the impacts of modern breeding and selection on the maize genome.