Comprehensive analysis of imprinted genes in maize reveals limited conservation with other species and allelic variation for imprinting
Amanda J. Waters, Paul Bilinski, Steve R. Eichten, Matthew W. Vaughn, Jeffrey Ross-Ibarra, Mary Gehring, Nathan M. Springer
(Submitted on 29 Jul 2013)
In plants, a subset of genes exhibit imprinting in endosperm tissue such that expression is primarily from the maternal or paternal allele. Imprinting may arise as a consequence of mechanisms for silencing of transposons during reproduction, and in some cases imprinted expression of particular genes may provide a selective advantage such that it is conserved across species. Separate mechanisms for the origin of imprinted expression patterns and maintenance of these patterns may result in substantial variation in the targets of imprinting in different species. Here we present deep sequencing of RNAs isolated from reciprocal crosses of four diverse maize genotypes, providing a comprehensive analysis of imprinting in maize that allows evaluation of imprinting at more than 95% of endosperm-expressed genes. We find that over 500 genes exhibit statistically significant parent-of-origin effects in maize endosperm tissue, but focused our analyses on a subset of these genes that had >90% expression from the maternal allele (69 genes) or from the paternal allele (108 genes) in at least one reciprocal cross. Over 10% of imprinted genes show evidence of allelic variation for imprinting. A comparison of imprinting in maize and rice reveals that only 13% of genes with syntenic orthologs in both species exhibit conserved imprinting. Genes that exhibit conserved imprinting in maize relative to rice have elevated dN/dS ratios compared to other imprinted genes, suggesting a history of more rapid evolution. Together, these data suggest that imprinting only has functional relevance at a subset of loci that currently exhibit imprinting in maize.
The genome of the medieval Black Death agent (extended abstract)
Ashok Rajaraman, Eric Tannier, Cedric Chauve
(Submitted on 29 Jul 2013)
The genome of a 650 year old Yersinia pestis bacteria, responsible for the medieval Black Death, was recently sequenced and assembled into 2,105 contigs from the main chromosome. According to the point mutation record, the medieval bacteria could be an ancestor of most Yersinia pestis extant species, which opens the way to reconstructing the organization of these contigs using a comparative approach. We show that recent computational paleogenomics methods, aiming at reconstructing the organization of ancestral genomes from the comparison of extant genomes, can be used to correct, order and complete the contig set of the Black Death agent genome, providing a full chromosome sequence, at the nucleotide scale, of this ancient bacteria. This sequence suggests that a burst of mobile elements insertions predated the Black Death, leading to an exceptional genome plasticity and increase in rearrangement rate.
The genomic impacts of drift and selection for hybrid performance in maize
Justin P. Gerke, Jode W. Edwards, Katherine E. Guill, Jeffrey Ross-Ibarra, Michael D. McMullen
(Submitted on 27 Jul 2013)
Modern maize breeding relies upon selection in inbreeding populations to improve the performance of cross-population hybrids. The United States Department of Agriculture – Agricultural Research Service reciprocal recurrent selection experiment between the Iowa Stiff Stalk Synthetic (BSSS) and the Iowa Corn Borer Synthetic No. 1 (BSCB1) populations represents one of the longest standing models of selection for hybrid performance. To investigate the genomic impact of this selection program, we used the Illumina MaizeSNP50 high-density SNP array to determine genotypes of progenitor lines and over 600 individuals across multiple cycles of selection. Consistent with previous research (Messmer et al., 1991; Labate et al., 1997; Hagdorn et al., 2003; Hinze et al., 2005), we found that genetic diversity within each population steadily decreases, with a corresponding increase in population structure. High marker density also enabled the first view of haplotype ancestry, fixation and recombination within this historic maize experiment. Extensive regions of haplotype fixation within each population are visible in the pericentromeric regions, where large blocks trace back to single founder inbreds. Simulation attributes most of the observed reduction in genetic diversity to genetic drift. Signatures of selection were difficult to observe in the background of this strong genetic drift, but heterozygosity in each population has fallen more than expected. Regions of haplotype fixation represent the most likely targets of selection, but as observed in other germplasm selected for hybrid performance (Feng et al., 2006), there is no overlap between the most likely targets of selection in the two populations. We discuss how this pattern is likely to occur during selection for hybrid performance, and how it poses challenges for dissecting the impacts of modern breeding and selection on the maize genome.
Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays
Heejung Shim, Matthew Stephens
(Submitted on 27 Jul 2013)
Understanding how genetic variants influence cellular-level processes is an important step towards understanding how they influence important organismal-level traits, or “phenotypes”, including human disease susceptibility. To this end scientists are undertaking large-scale genetic association studies that aim to identify genetic variants associated with molecular and cellular phenotypes, such as gene expression, transcription factor binding, or chromatin accessibility. These studies use high-throughput sequencing assays (e.g. RNA-seq, ChIP-seq, DNase-seq) to obtain high-resolution data on how the traits vary along the genome in each sample. However, typical association analyses fail to exploit these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length. Here we develop and apply statistical methods that better exploit the high-resolution data. The key idea is to treat the sequence data as measuring an underlying “function” that varies along the genome, and then, building on wavelet-based methods for functional data analysis, test for association between genetic variants and the underlying function. Applying these methods to identify genetic variants associated with chromatin accessibility (dsQTLs) we find that they identify substantially more associations than a simpler window-based analysis, and in total we identify 772 novel dsQTLs not identified by the original analysis.
Robust forward simulations of recurrent positive selection
Lawrence H. Uricchio, Ryan D. Hernandez
(Submitted on 24 Jul 2013)
It is well known that recurrent positive selection reduces the amount of genetic variation at linked sites. In recent decades, analytical results have been proposed to quantify the magnitude of this reduction with simple Wright-Fisher models and diffusion approximations. However, extending these results to include interference between selected sites, arbitrary selection schemes, and complicated demographic processes has proved to be challenging. Forward simulation can provide insights into these processes, but few studies have examined recurrent positive selection in a forward simulation context due to computational constraints. Here, we extend the flexible forward simulator SFS_CODE to greatly improve the efficiency of simulations of recurrent positive selection. Forward simulations are computationally intensive and often necessitate rescaling of relevant parameters (e.g., population size and sequence length) to achieve computational feasibility. However, it is not obvious that parameter rescaling will maintain expected patterns of diversity in all parameter regimes. We develop a simple method for parameter rescaling that provides the best possible computational performance for a given error tolerance, and a detailed theoretical analysis of the robustness of rescaling across the parameter space. These results show that ad hoc approaches to parameter rescaling under the recurrent hitchhiking model may not always provide sufficiently accurate dynamics, potentially skewing patterns of diversity in simulated DNA sequences.
Genetics of single-cell protein abundance variation in large yeast populations
Frank W. Albert, Sebastian Treusch, Arthur H. Shockley, Joshua S. Bloom, Leonid Kruglyak
(Submitted on 25 Jul 2013)
Many DNA sequence variants influence phenotypes by altering gene expression. Our understanding of these variants is limited by sample sizes of current studies and by measurements of mRNA rather than protein abundance. We developed a powerful method for identifying genetic loci that influence protein expression in very large populations of the yeast Saccharomyes cerevisiae. The method measures single-cell protein abundance through the use of green-fluorescent-protein tags. We applied this method to 160 genes and detected many more loci per gene than previous studies. We also observed closer correspondence between loci that influence protein abundance and loci that influence mRNA abundance of a given gene. Most loci cluster at hotspot locations that influence multiple proteins – in some cases, more than half of those examined. The variants that underlie these hotspots have profound effects on the gene regulatory network and provide insights into genetic variation in cell physiology between yeast strains.
Speed of adaptation and genomic signatures in arms race and trench warfare models of host-parasite coevolution
Aurelien Tellier, Stefany Moreno-Game, Wolfgang Stephan
(Submitted on 25 Jul 2013)
Host and parasite population genomic data are increasingly used to discover novel major genes underlying coevolution, assuming that natural selection generates two distinguishable polymorphism patterns: selective sweeps and balancing selection. These genomic signatures would result from two coevolutionary dynamics, the trench warfare with fast cycles of allele frequencies and the arms race with slow recurrent fixation of alleles. However, based on genome scans for selection, few genes for coevolution have yet been found in hosts. To address this issue, we build a gene-for-gene model with genetic drift, mutation and integrating coalescent simulations to study observable genomic signatures at host and parasite loci. In contrast to the conventional wisdom, we show that coevolutionary cycles are not faster under the trench warfare model compared to the arms race, except for large population sizes and high values of coevolutionary costs. Based on the generated SNP frequencies, the expected balancing selection signature under the trench warfare dynamics appears to be only observable in parasite sequences in a limited range of parameter, if effective population sizes are sufficiently large (>1000) and if selection has been acting for a long time (>4N generations). On the other hand, the typical signature of the arms race dynamics, i.e. selective sweeps, can be detected in parasite and to a lesser extent in host populations even if coevolution is recent. We suggest to study signatures of coevolution via population genomics of parasites rather than hosts, and caution against inferring coevolutionary dynamics based on the speed of coevolution.