Sequencing of 15,622 gene-bearing BACs reveals new features of the barley genome

Sequencing of 15,622 gene-bearing BACs reveals new features of the barley genome
María Muñoz-Amatriaín , Stefano Lonardi , MingCheng Luo , Kavitha Madishetty , Jan Svensson , Matthew Moscou , Steve Wanamaker , Tao Jiang , Andris Kleinhofs , Gary Muehlbauer , Roger Wise , Nils Stein , Yaqin Ma , Edmundo Rodriguez , Dave Kudrna , Prasanna R Bhat , Shiaoman Chao , Pascal Condamine , Shane Heinen , Josh Resnik , Rod Wing , Heather N Witt , Matthew Alpert , Marco Beccuti , Serdar Bozdag , Francesca Cordero , Hamid Mirebrahim , Rachid Ounit , Yonghui Wu , Frank You , Jie Zheng , Hana Šimková , Jaroslav Doležel , Jane Grimwood , Jeremy Schmutz , Denisa Duma , Lothar Altschmied , Tom Blake , Phil Bregitzer , Laurel Cooper , Muharrem Dilbirligi , Anders Falk , Leila Feiz , Andreas Graner , Perry Gustafson , Patrick Hayes , Peggy Lemaux , Jafar Mammadov , Timothy Close
doi: http://dx.doi.org/10.1101/018978

Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, since only 6,278 BACs in the physical map were sequenced, detailed fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15,622 BACs representing the minimal tiling path of 72,052 physical mapped gene-bearing BACs. This generated about 1.7 Gb of genomic sequence containing 17,386 annotated barley genes. Exploration of the sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high rates of recombination, there are also gene-dense regions with suppressed recombination. Knowledge of these deviant regions is relevant to trait introgression, genome-wide association studies, genomic selection model development and map-based cloning strategies. Sequences and their gene and SNP annotations can be accessed and exported via http://harvest-web.org/hweb/utilmenu.wc or through the software HarvEST:Barley (download from harvest.ucr.edu). In the latter, we have implemented a synteny viewer between barley and Aegilops tauschii to aid in comparative genome analysis.

A Chronological Atlas of Natural Selection in the Human Genome during the Past Half-million Years

A Chronological Atlas of Natural Selection in the Human Genome during the Past Half-million Years
Hang Zhou , Sile Hu , Rostislav Matveev , Qianhui Yu , Jing Li , Philipp Khaitovich , Li Jin , Michael Lachmann , Mark Stoneking , Qiaomei Fu , Kun Tang
doi: http://dx.doi.org/10.1101/018929

The spatiotemporal distribution of recent human adaptation is a long standing question. We developed a new coalescent-based method that collectively assigned human genome regions to modes of neutrality or to positive, negative, or balancing selection. Most importantly, the selection times were estimated for all positive selection signals, which ranged over the last half million years, penetrating the emergence of anatomically modern human (AMH). These selection time estimates were further supported by analyses of the genome sequences from three ancient AMHs and the Neanderthals. A series of brain function-related genes were found to carry signals of ancient selective sweeps, which may have defined the evolution of cognitive abilities either before Neanderthal divergence or during the emergence of AMH. Particularly, signals of brain evolution in AMH are strongly related to Alzheimer’s disease pathways. In conclusion, this study reports a chronological atlas of natural selection in Human.

Theoretical consequences of the Mutagenic Chain Reaction for manipulating natural populations

Theoretical consequences of the Mutagenic Chain Reaction for manipulating natural populations
Robert Unckless , Philipp Messer , Andrew Clark
doi: http://dx.doi.org/10.1101/018986

The use of recombinant genetic technologies for population manipulation has mostly remained an abstract idea due to the lack of a suitable means to drive novel gene constructs to high frequency in populations. Recently Gantz and Bier showed that the use of CRISPR/Cas9 technology could provide an artificial drive mechanism, the so-called Mutagenic Chain Reaction (MCR), which could lead to rapid fixation of even a deleterious introduced allele. We establish the equivalence of this system to models of meiotic drive and review the results of simple models showing that, when there is a fitness cost to the MCR allele, an internal equilibrium exists that is usually unstable. Introductions must be at a frequency above this critical point for the successful invasion of the MCR allele. These modeling results have important implications for application of MCR in natural populations.

Driven to Extinction: On the Probability of Evolutionary Rescue from Sex-Ratio Meiotic Drive

Driven to Extinction: On the Probability of Evolutionary Rescue from Sex-Ratio Meiotic Drive
Robert Unckless , Andrew Clark
doi: http://dx.doi.org/10.1101/018820

Many evolutionary processes result in sufficiently low mean fitness that they pose a risk of species extinction. Sex-ratio meiotic drive was recognized by W.D. Hamilton (1967) to pose such a risk, because as the driving sex chromosome becomes common, the opposite sex becomes rare. We expand on Hamilton’s classic model by allowing for the escape from extinction due to evolution of suppressors of X and Y drivers. We explore differences in the two systems in their probability of escape from extinction. Several novel conclusions are evident, including a) that extinction time scales approximately with the log of population size so that even large populations may go extinct quickly, b) extinction risk is driven by the relationship between female fecundity and drive strength, c) anisogamy and the fact that X and Y drive result in sex ratios skewed in opposite directions, mean systems with Y drive are much more likely to go extinct than those with X drive, and d) suppressors are most likely to become established when the strength of drive is intermediate, since weak drive leads to weak selection for suppression and strong drive leads to rapid extinction.

Controlling False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data

Controlling False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data

David M Rocke , Luyao Ruan , J. Jared Gossett , Blythe Durbin-Johnson , Sharon Aviran
doi: http://dx.doi.org/10.1101/018739

We review existing methods for the analysis of RNA-Seq data and place them in a common framework of a sequence of tasks that are usually part of the process. We show that many existing methods produce large numbers of false positives in cases where the null hypothesis is true by construction and where actual data from RNA-Seq studies are used, as opposed to simulations that make specific assumptions about the nature of the data. We show that some of those mathematical assumptions about the data likely are one of the causes of the false positives, and define a general structure that is not apparently subject to these problems. The best performance was shown by limma-voom and by some simple methods composed of easily understandable steps.

Fine-mapping cellular QTLs with RASQUAL and ATAC-seq

Fine-mapping cellular QTLs with RASQUAL and ATAC-seq

Natsuhiko Kumasaka , Andrew Knights , Daniel Gaffney
doi: http://dx.doi.org/10.1101/018788

When cellular traits are measured using high-throughput DNA sequencing quantitative trait loci (QTLs) manifest at two levels: population level differences between individuals and allelic differences between cis-haplotypes within individuals. We present RASQUAL (Robust Allele Specific QUAntitation and quality controL), a novel statistical approach for association mapping that integrates genetic effects and robust modelling of biases in next generation sequencing (NGS) data within a single, probabilistic framework. RASQUAL substantially improves causal variant localisation and sensitivity of association detection over existing methods in RNA-seq, DNaseI-seq and ChIP-seq data. We illustrate how RASQUAL can be used to maximise association detection by generating the first map of chromatin accessibility QTLs (caQTLs) in a European population using ATAC-seq. Despite a modest sample size, we identified 2,706 independent caQTLs (FDR 10%) and illustrate how RASQUAL’s improved causal variant localisation provides powerful information for fine-mapping disease-associated variants. We also map “multipeak” caQTLs, identical genetic associations found across multiple, independent open chromatin regions and illustrate how genetic signals in ATAC-seq data can be used to link distal regulatory elements with gene promoters. Our results highlight how joint modelling of population and allele-specific genetic signals can improve functional interpretation of noncoding variation.

The “Gini index” in genetics: measuring genetic architecture complexity of quantitative traits

The “Gini index” in genetics: measuring genetic architecture complexity of quantitative traits

Xia Shen
doi: http://dx.doi.org/10.1101/018713

Genetic architecture is a general terminology used and discussed very often in complex traits genetics. It is related to the number of functional loci involved in explaining variation of a complex trait and the distribution of genetic effects across these loci. Understanding the complexity level of the genetic architecture of complex traits is essential for evaluating the potential power of mapping functional loci and prediction of complex traits. However, there has been no quantitative measurement of the genetic architecture complexity, which makes it difficult to link results from genetic data analysis to such terminology. Inspired by the “Gini index” for measuring income distribution in economics, I develop a genetic architecture score (“GA score”) to measure genetic architecture complexity. Simulations indicate that the GA score is an effective measurement of the complexity level of complex traits genetic architecture.

Detecting recent selective sweeps while controlling for mutation rate and background selection

Detecting recent selective sweeps while controlling for mutation rate and background selection

Christian D. Huber , Michael DeGiorgio , Ines Hellmann , Rasmus Nielsen
doi: http://dx.doi.org/10.1101/018697

A composite likelihood ratio test implemented in the program SweepFinder is a commonly used method for scanning a genome for recent selective sweeps. SweepFinder uses information on the spatial pattern of the site frequency spectrum (SFS) around the selected locus. To avoid confounding effects of background selection and variation in the mutation process along the genome, the method is typically applied only to sites that are variable within species. However, the power to detect and localize selective sweeps can be greatly improved if invariable sites are also included in the analysis. In the spirit of a Hudson-Kreitman-Aguadé test, we suggest to add fixed differences relative to an outgroup to account for variation in mutation rate, thereby facilitating more robust and powerful analyses. We also develop a method for including background selection modeled as a local reduction in the effective population size. Using simulations we show that these advances lead to a gain in power while maintaining robustness to mutation rate variation. Furthermore, the new method also provides more precise localization of the causative mutation than methods using the spatial pattern of segregating sites alone.

Surveying the relative impact of mRNA features on local ribosome profiling read density in 28 datasets.

Surveying the relative impact of mRNA features on local ribosome profiling read density in 28 datasets.

Patrick O’Connor , Dmitry Andreev , Pavel Baranov
doi: http://dx.doi.org/10.1101/018762

Ribosome profiling is a promising technology for exploring gene expression. However, ribosome profiling data are characterized by a substantial number of outliers due to technical and biological factors. Here we introduce a simple computational method, Ribo-seq Unit Step Transformation (RUST) for the characterization of ribosome profiling data. We show that RUST is robust and outperforms conventional normalization techniques in the presence of sporadic noise. We used RUST to analyse 28 publicly available ribosome profiling datasets obtained from mammalian cells and tissues and from yeast. This revealed substantial protocol dependent variation in the composition of footprint libraries. We selected a high quality dataset to explore the mRNA features that affect local decoding rates and found that the amino acid identity encoded by the codon in the A-site is the major contributing factor followed by the identity of the codon itself and then the amino acid in the P-site. We also found that bulky amino acids slow down ribosome movement when they occur within the peptide tunnel and Proline residues may decrease or increase ribosome velocities depending on the context in which they occur. Moreover we show that a few parameters obtained with RUST are sufficient for predicting experimental densities with high accuracy. Due to its robustness and low computational demand, RUST could be used for quick routine characterization of ribosome profiling datasets to assess their quality as well as for the analysis of the relative impact of mRNA sequence features on local decoding rates.

Most viewed on Haldane’s Sieve: April 2014

The most viewed posts this month were: