High Frequency Haplotypes are Expected Events, not Historical Figures

High Frequency Haplotypes are Expected Events, not Historical Figures

Elsa G Guillot, Murray P Cox
doi: http://dx.doi.org/10.1101/022160

Cultural transmission of reproductive success states that successful men have more children and pass this greater fecundity to their offspring. Balaresque and colleagues found high frequency haplotypes in a Central Asian Y chromosome dataset, which they attribute to cultural transmission of reproductive success by prominent historical men, including Genghis Khan. Using coalescent simulation, we show that these high frequency haplotypes are expected simply by chance. Hence, an explanation invoking cultural transmission of reproductive success is statistically unnecessary.

Low but significant genetic differentiation underlies biologically meaningful phenotypic divergence in a large Atlantic salmon population

Low but significant genetic differentiation underlies biologically meaningful phenotypic divergence in a large Atlantic salmon population

Tutku Aykanat, Susan E Johnston, Panu Orell, Eero Niemelä, Jaakko Erkinaro, Craig Primmer
doi: http://dx.doi.org/10.1101/022178

Despite decades of research assessing the genetic structure of natural populations, the biological meaning of low yet significant genetic divergence often remains unclear due to a lack of associated phenotypic and ecological information. At the same time, structured populations with low genetic divergence and overlapping boundaries can potentially provide excellent models to study the eco-evolutionary dynamics in cases where high resolution genetic markers and relevant phenotypic and life history information are available. Here, we combined SNP-based population inference with extensive phenotypic and life history data to identify potential biological mechanisms driving fine scale sub-population differentiation in Atlantic salmon (Salmo salar) from the Teno River, a major salmon river in Europe. Two sympatrically occurring sub-populations had low but significant genetic differentiation (FST = 0.018) and displayed marked differences in the distribution of life history strategies, including variation in juvenile growth rate, age at maturity and size within age classes. Large, late-maturing individuals were virtually absent from one of the two sub-populations and there were significant differences in juvenile growth rates and size-at-age after oceanic migration between individuals in the respective sub-populations. Our findings suggest that different eco-evolutionary processes affect each sub-population and that hybridization and subsequent selection may maintain low genetic differentiation without hindering adaptive divergence.

Worldwide patterns of human epigenetic variation

Worldwide patterns of human epigenetic variation

Oana Carja, Julia L MacIsaac, Sarah M Mah, Brenna M Henn, Michael S Kobor, Marcus W Feldman, Hunter B Fraser
doi: http://dx.doi.org/10.1101/021931

DNA methylation is an epigenetic modification, influenced by both genetic and environmental variation, that can affect transcription and many organismal phenotypes. Although patterns of DNA methylation have been shown to differ between human populations, it remains to be determined whether epigenetic diversity mirrors the patterns observed for DNA polymorphisms or gene expression levels. We measured DNA methylation at 480,000 sites in 34 individuals from five diverse human populations in the Human Genome Diversity Panel, and analyzed these together with single nucleotide polymorphisms (SNPs) and gene expression data. We found greater population-specificity of DNA methylation than of mRNA levels, which may be driven by the greater genetic control of methylation. This study provides insights into gene expression and its epigenetic regulation across populations and offers a deeper understanding of worldwide patterns of epigenetic diversity in humans.

Predicting genome sizes and restriction enzyme recognition-sequence probabilities across the eukaryotic tree of life

Predicting genome sizes and restriction enzyme recognition-sequence probabilities across the eukaryotic tree of life

Santiago Herrera, Paula H. Reyes-Herrera, Timothy M. Shank
doi: http://dx.doi.org/10.1101/007781

High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes ? generically known as restriction-site associated DNA sequencing (RAD-seq) ? is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design element of any RAD-seq study is a knowledge of the approximate number of genetic markers that can be obtained for a taxon using different restriction enzymes, as this number determines the scope of a project, and ultimately defines its success. This number can only be directly determined if a reference genome sequence is available, or it can be estimated if the genome size and restriction recognition sequence probabilities are known. However, both scenarios are uncommon for non-model species. Here, we performed systematic in silico surveys of recognition sequences, for diverse and commonly used type II restriction enzymes across the eukaryotic tree of life. Our observations reveal that recognition-sequence frequencies for a given restriction enzyme are strikingly variable among broad eukaryotic taxonomic groups, being largely determined by phylogenetic relatedness. We demonstrate that genome sizes can be predicted from cleavage frequency data obtained with restriction enzymes targeting ?neutral? elements. Models based on genomic compositions are also effective tools to accurately calculate probabilities of recognition sequences across taxa, and can be applied to species for which reduced-representation data is available (including transcriptomes and ?neutral? RAD-seq datasets). The analytical pipeline developed in this study, PredRAD (https://github.com/phrh/PredRAD), and the resulting databases constitute valuable resources that will help guide the design of any study using RAD-seq or related methods.

How obstacles perturb population fronts and alter their genetic structure

How obstacles perturb population fronts and alter their genetic structure

Wolfram Moebius, Andrew W. Murray, David R. Nelson
doi: http://dx.doi.org/10.1101/021964

As populations spread into new territory, environmental heterogeneities can shape the population front and genetic composition. We study here the effect of one important building block of inhomogeneous environments, compact obstacles. With a combination of experiments, theory, and simulation, we show how isolated obstacles both create long-lived distortions of the front shape and amplify the effect of genetic drift. A system of bacteriophage T7 spreading on a spatially heterogeneous Escherichia coli lawn serves as an experimental model system to study population expansions. Using an inkjet printer, we create well-defined replicates of the lawn and quantitatively study the population expansion manifested in plaque growth. The transient perturbations of the plaque boundary found in the experiments are well described by a model in which the front moves with constant speed. Independent of the precise details of the expansion, we show that obstacles create a kink in the front that persists over large distances and is insensitive to the details of the obstacle’s shape. The small deviations between experimental findings and the predictions of the constant speed model can be understood with a more general reaction-diffusion model, which reduces to the constant speed model when the obstacle size is large compared to the front width. Using this framework, we demonstrate that frontier alleles that just graze the side of an isolated obstacle increase in abundance, a phenomenon we call ‘geometry-enhanced genetic drift’, complementary to the founder effect associated with spatial bottlenecks. Bacterial range expansions around nutrient-poor barriers and stochastic simulations confirm this prediction, the latter highlight as well the effect of the obstacle on the genealogy of individuals at the front. We argue that related ideas and experimental techniques are applicable to a wide variety of more complex environments, leading to a better understanding of how environmental heterogeneities affect population range expansions.

The anatomical distribution of genetic associations

The anatomical distribution of genetic associations

Alan B Wells, Nathan Kopp, Xiaoxiao Xu, David R O’Brien, Wei Yang, Arye Nehorai, Tracy L. Adair-Kirk, Raphael Kopan, Joseph D Dougherty
doi: http://dx.doi.org/10.1101/021824

Deeper understanding of the anatomical intermediaries for disease and other complex genetic traits is essential to understanding mechanisms and developing new interventions. Existing ontology tools provide functional annotations for many genes in the genome and they are widely used to develop mechanistic hypotheses based on genetic and transcriptomic data. Yet, information about where a set of genes is expressed may be equally useful in interpreting results and forming novel mechanistic hypotheses for a trait. Therefore, we developed a framework for statistically testing the relationship between gene expression across the body and sets of candidate genes from across the genome. We validated this tool and tested its utility on three applications. First, using thousands of loci identified by GWA studies, our framework identifies the number of disease-associated genes that have enriched expression in the disease-affected tissue. Second, we experimentally confirmed an underappreciated prediction highlighted by our tool: variation in skin expressed genes are a major quantitative genetic modulator of white blood cell count – a trait considered to be a feature of the immune system. Finally, using gene lists derived from sequencing data, we show that human genes under constrained selective pressure are disproportionately expressed in nervous system tissues.

The two-speed genomes of filamentous pathogens: waltz with plants

The two-speed genomes of filamentous pathogens: waltz with plants

Suomeng Dong, Sylvain Raffaele, Sophien Kamoun
doi: http://dx.doi.org/10.1101/021774

Fungi and oomycetes include deep and diverse lineages of eukaryotic plant pathogens. The last 10 years have seen the sequencing of the genomes of a multitude of species of these so-called filamentous plant pathogens. Already, fundamental concepts have emerged. Filamentous plant pathogen genomes tend to harbor large repertoires of genes encoding virulence effectors that modulate host plant processes. Effector genes are not randomly distributed across the genomes but tend to be associated with compartments enriched in repetitive sequences and transposable elements. These findings have led to the “two-speed genome” model in which filamentous pathogen genomes have a bipartite architecture with gene sparse, repeat rich compartments serving as a cradle for adaptive evolution. Here, we review this concept and discuss how plant pathogens are great model systems to study evolutionary adaptations at multiple time scales. We will also introduce the next phase of research on this topic.

Most viewed on Haldane’s Sieve: June 2015

The most viewed preprints on Haldane’s Sieve in June 2015 were:

Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation

Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation

David E Weinberg, Premal Shah, Stephen W Eichhorn, Jeffrey A Hussmann, Joshua B Plotkin, David P Bartel
doi: http://dx.doi.org/10.1101/021501

Ribosome-footprint profiling provides genome-wide snapshots of translation, but technical challenges can confound its analysis. Here, we use improved methods to obtain ribosome-footprint profiles and mRNA abundances that more faithfully reflect gene expression in Saccharomyces cerevisiae. Our results support proposals that both the beginning of coding regions and codons matching rare tRNAs are more slowly translated. They also indicate that emergent polypeptides with as few as three basic residues within a 10-residue window tend to slow translation. With the improved mRNA measurements, the variation attributable to translational control in exponentially growing yeast was less than previously reported, and most of this variation could be predicted with a simple model that considered mRNA abundance, upstream open reading frames, cap-proximal structure and nucleotide composition, and lengths of the coding and 5′- untranslated regions. Collectively, our results reveal key features of translational control in yeast and provide a framework for executing and interpreting ribosome- profiling studies.

Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment

Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment

Rob Patro, Geet Duggal, Carl Kingsford
doi: http://dx.doi.org/10.1101/021592

Transcript quantification is a central task in the analysis of RNA-seq data. Accurate computational methods for the quantification of transcript abundances are essential for downstream analysis. However, most existing approaches are much slower than is necessary for their degree of accuracy. We introduce Salmon, a novel method and software tool for transcript quantification that exhibits state-of-the-art accuracy while being significantly faster than most other tools. Salmon achieves this through the combined application of a two-phase inference procedure, a reduced data representation, and a novel lightweight read alignment algorithm. Salmon is written in C++11, and is available under the GPL v3 license as open-source software at https://combine-lab.github.io/salmon.