The pattern and distribution of deleterious mutations in maize

The pattern and distribution of deleterious mutations in maize
Sofiane Mezmouk, Jeffrey Ross-Ibarra
(Submitted on 2 Aug 2013)

Most non-synonymous mutations are thought to be deleterious because of their effect on protein sequence. These polymorphisms are expected to be removed or kept at low frequency by the action of natural selection, and rare deleterious variants have been implicated as a possible explanation for the “missing heritability” seen in many studies of complex traits. Nonetheless, the effect of positive selection on linked sites or drift in small or inbred populations may also impact the evolution of deleterious alleles. Here, we made use of genome-wide genotyping data to characterize deleterious variants in a large panel of maize inbred lines. We show that, in spite of small effective population sizes and inbreeding, most putatively deleterious SNPs are indeed at low frequencies within individual genetic groups. We find that genes showing associations with a number of complex traits are enriched for deleterious variants. Together these data are consistent with the dominance model of heterosis, in which complementation of numerous low frequency, weak deleterious variants contribute to hybrid vigor.

Population subdivision with migration can facilitate evolution on rugged fitness landscapes

Population subdivision with migration can facilitate evolution on rugged fitness landscapes
Anne-Florence Bitbol, David J. Schwab
(Submitted on 1 Aug 2013)

We show that subdivision of an asexual population into demes connected by migration significantly accelerates the crossing of fitness valleys and plateaus over a wide parameter range, both with respect to the non-subdivided population and with respect to a single deme. We predict the existence of a parameter range where valley or plateau crossing by the metapopulation is as fast as that of the fastest deme, and we verify this prediction using stochastic simulations. Finally, we extend our work to the case of a large population connected by migration to one or several smaller islands.

Distortion of genealogical properties when the sample is very large

Distortion of genealogical properties when the sample is very large
Anand Bhaskar, Andrew G. Clark, Yun S. Song
(Submitted on 1 Aug 2013)

Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the tradeoff between accuracy and computational efficiency, we propose a hybrid algorithm that utilizes the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.

Maximum likelihood evidence for Neandertal admixture in Eurasian populations from three genomes

Maximum likelihood evidence for Neandertal admixture in Eurasian populations from three genomes
Konrad Lohse, Laurent A.F. Frantz
(Submitted on 31 Jul 2013)

Although there has been much interest in estimating divergence and admixture from genomic data, it has proven difficult to distinguish gene flow after divergence from alternative histories involving structure in the ancestral population. The lack of a formal test to distinguish these scenarios has sparked recent controversy about the possibility of interbreeding between Neandertals and modern humans in Eurasia. We derive the probability of mutational configurations in non-recombining sequence blocks under alternative histories of divergence with admixture and ancestral structure. Dividing the genome into short blocks makes it possible to compute maximum likelihood estimates of parameters under both models. We apply this method to triplets of human Neandertal genomes and quantify the relative support for models of long-term population structure in the ancestral African popuation and admixture from Neandertals into Eurasian populations after their expansion out of Africa. Our analysis allows us — for the first time — to formally reject a history of ancestral population structure and instead reveals strong support for admixture from Neandertals into Eurasian populations at a higher rate (3.4%-7.9%) than suggested previously.

Late-replicating CNVs as a source of new genes

Late-replicating CNVs as a source of new genes
David Juan, Daniel Rico, Tomas Marques-Bonet, Oscar Fernandez-Capetillo, Alfonso Valencia
(Submitted on 31 Jul 2013)

Asynchronous replication of the genome has been associated with different rates of point mutation and copy number variation (CNV) in human populations. Here, we explored if the bias in the generation of CNV that is associated to DNA replication timing might have conditioned the birth of new protein-coding genes during evolution. We show that genes that were duplicated during primate evolution are more commonly found among the human genes located in late-replicating CNV regions. We traced the relationship between replication timing and the evolutionary age of duplicated genes. Strikingly, we found that there is a significant enrichment of evolutionary younger duplicates in late replicating regions of the human and mouse genome. Indeed, the presence of duplicates in late replicating regions gradually decreases as the evolutionary time since duplication extends. Our results suggest that the accumulation of recent duplications in late replicating CNV regions is an active process influencing genome evolution.

SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching


SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching

Ilya Y. Zhbannikov, Samuel S. Hunter, Matthew L. Settles, James A. Foster
(Submitted on 31 Jul 2013)

With the advent of Next-Generation (NG) sequencing, it has become possible to sequence an entire genome quickly and inexpensively. However, in some experiments one only needs to extract and assembly a portion of the sequence reads, for example when performing transcriptome studies, sequencing mitochondrial genomes, or characterizing exomes. With the raw DNA-library of a complete genome it would appear to be a trivial problem to identify reads of interest. But it is not always easy to incorporate well-known tools such as BLAST, BLAT, Bowtie, and SOAP directly into a bioinformatics pipelines before the assembly stage, either due to in- compatibility with the assembler’s file inputs, or because it is desirable to incorporate information that must be extracted separately. For example, in order to incorporate flowgrams from a Roche 454 sequencer into the Newbler assembler it is necessary to first extract them from the original SFF files. We present SlopMap, a bioinformatics software utility which allows rapid identification similar to provided target sequences from either Roche 454 or Illumnia DNA library. With a simple and intuitive command- line interface along with file output formats compatible with assembly programs, SlopMap can be directly embedded in biological data processing pipeline without any additional programming work. In addition, SlopMap preserves flowgram information needed for Roche 454 assembler.

Comprehensive analysis of imprinted genes in maize reveals limited conservation with other species and allelic variation for imprinting

Comprehensive analysis of imprinted genes in maize reveals limited conservation with other species and allelic variation for imprinting
Amanda J. Waters, Paul Bilinski, Steve R. Eichten, Matthew W. Vaughn, Jeffrey Ross-Ibarra, Mary Gehring, Nathan M. Springer
(Submitted on 29 Jul 2013)

In plants, a subset of genes exhibit imprinting in endosperm tissue such that expression is primarily from the maternal or paternal allele. Imprinting may arise as a consequence of mechanisms for silencing of transposons during reproduction, and in some cases imprinted expression of particular genes may provide a selective advantage such that it is conserved across species. Separate mechanisms for the origin of imprinted expression patterns and maintenance of these patterns may result in substantial variation in the targets of imprinting in different species. Here we present deep sequencing of RNAs isolated from reciprocal crosses of four diverse maize genotypes, providing a comprehensive analysis of imprinting in maize that allows evaluation of imprinting at more than 95% of endosperm-expressed genes. We find that over 500 genes exhibit statistically significant parent-of-origin effects in maize endosperm tissue, but focused our analyses on a subset of these genes that had >90% expression from the maternal allele (69 genes) or from the paternal allele (108 genes) in at least one reciprocal cross. Over 10% of imprinted genes show evidence of allelic variation for imprinting. A comparison of imprinting in maize and rice reveals that only 13% of genes with syntenic orthologs in both species exhibit conserved imprinting. Genes that exhibit conserved imprinting in maize relative to rice have elevated dN/dS ratios compared to other imprinted genes, suggesting a history of more rapid evolution. Together, these data suggest that imprinting only has functional relevance at a subset of loci that currently exhibit imprinting in maize.

The genome of the medieval Black Death agent

The genome of the medieval Black Death agent (extended abstract)
Ashok Rajaraman, Eric Tannier, Cedric Chauve
(Submitted on 29 Jul 2013)

The genome of a 650 year old Yersinia pestis bacteria, responsible for the medieval Black Death, was recently sequenced and assembled into 2,105 contigs from the main chromosome. According to the point mutation record, the medieval bacteria could be an ancestor of most Yersinia pestis extant species, which opens the way to reconstructing the organization of these contigs using a comparative approach. We show that recent computational paleogenomics methods, aiming at reconstructing the organization of ancestral genomes from the comparison of extant genomes, can be used to correct, order and complete the contig set of the Black Death agent genome, providing a full chromosome sequence, at the nucleotide scale, of this ancient bacteria. This sequence suggests that a burst of mobile elements insertions predated the Black Death, leading to an exceptional genome plasticity and increase in rearrangement rate.

The genomic impacts of drift and selection for hybrid performance in maize

The genomic impacts of drift and selection for hybrid performance in maize
Justin P. Gerke, Jode W. Edwards, Katherine E. Guill, Jeffrey Ross-Ibarra, Michael D. McMullen
(Submitted on 27 Jul 2013)

Modern maize breeding relies upon selection in inbreeding populations to improve the performance of cross-population hybrids. The United States Department of Agriculture – Agricultural Research Service reciprocal recurrent selection experiment between the Iowa Stiff Stalk Synthetic (BSSS) and the Iowa Corn Borer Synthetic No. 1 (BSCB1) populations represents one of the longest standing models of selection for hybrid performance. To investigate the genomic impact of this selection program, we used the Illumina MaizeSNP50 high-density SNP array to determine genotypes of progenitor lines and over 600 individuals across multiple cycles of selection. Consistent with previous research (Messmer et al., 1991; Labate et al., 1997; Hagdorn et al., 2003; Hinze et al., 2005), we found that genetic diversity within each population steadily decreases, with a corresponding increase in population structure. High marker density also enabled the first view of haplotype ancestry, fixation and recombination within this historic maize experiment. Extensive regions of haplotype fixation within each population are visible in the pericentromeric regions, where large blocks trace back to single founder inbreds. Simulation attributes most of the observed reduction in genetic diversity to genetic drift. Signatures of selection were difficult to observe in the background of this strong genetic drift, but heterozygosity in each population has fallen more than expected. Regions of haplotype fixation represent the most likely targets of selection, but as observed in other germplasm selected for hybrid performance (Feng et al., 2006), there is no overlap between the most likely targets of selection in the two populations. We discuss how this pattern is likely to occur during selection for hybrid performance, and how it poses challenges for dissecting the impacts of modern breeding and selection on the maize genome.

Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays

Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays
Heejung Shim, Matthew Stephens
(Submitted on 27 Jul 2013)

Understanding how genetic variants influence cellular-level processes is an important step towards understanding how they influence important organismal-level traits, or “phenotypes”, including human disease susceptibility. To this end scientists are undertaking large-scale genetic association studies that aim to identify genetic variants associated with molecular and cellular phenotypes, such as gene expression, transcription factor binding, or chromatin accessibility. These studies use high-throughput sequencing assays (e.g. RNA-seq, ChIP-seq, DNase-seq) to obtain high-resolution data on how the traits vary along the genome in each sample. However, typical association analyses fail to exploit these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length. Here we develop and apply statistical methods that better exploit the high-resolution data. The key idea is to treat the sequence data as measuring an underlying “function” that varies along the genome, and then, building on wavelet-based methods for functional data analysis, test for association between genetic variants and the underlying function. Applying these methods to identify genetic variants associated with chromatin accessibility (dsQTLs) we find that they identify substantially more associations than a simpler window-based analysis, and in total we identify 772 novel dsQTLs not identified by the original analysis.