High performance computation of landscape genomic models integrating local indices of spatial association

High performance computation of landscape genomic models integrating local indices of spatial association

Sylvie Stucki, Pablo Orozco-terWengel, Michael W. Bruford, Licia Colli, Charles Masembe, Riccardo Negrini, Pierre Taberlet, Stéphane Joost, the NEXTGEN Consortium
Comments: 1 figure in text, 1 figure in supplementary material
Subjects: Populations and Evolution (q-bio.PE)

Motivation: The increasing availability of high-throughput datasets requires powerful methods to support the detection of signatures of selection in landscape genomics. Results: We present an integrated approach to study signatures of local adaptation, providing rapid processing of whole genome data and enabling assessment of spatial association using molecular markers. Availabilty: Sam{\ss}ada is an open source software written in C++ available at http:lasig.epfl.ch/sambada (under the license GNU GPL 3). Compiled versions are provided for Windows, Linux and MacOS X. Contact: stephane.joost@epfl.ch, sylvie.stucki@a3.epfl.ch. Supplementary material is available online.


High-resolution transcriptome analysis with long-read RNA sequencing

High-resolution transcriptome analysis with long-read RNA sequencing

Hyunghoon Cho, Joe Davis, Xin Li, Kevin S. Smith, Alexis Battle, Stephen B. Montgomery
Comments: 29 pages, 8 figures, 11 supplementary figures
Subjects: Genomics (q-bio.GN)

RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.

Cis-regulatory elements and human evolution

Cis-regulatory elements and human evolution
Adam Siepel, Leonardo Arbiza

Modification of gene regulation has long been considered an important force in human evolution, particularly through changes to cis-regulatory elements (CREs) that function in transcriptional regulation. For decades, however, the study of cis-regulatory evolution was severely limited by the available data. New data sets describing the locations of CREs and genetic variation within and between species have now made it possible to study CRE evolution much more directly on a genome-wide scale. Here, we review recent research on the evolution of CREs in humans based on large-scale genomic data sets. We consider inferences based on primate divergence,human polymorphism, and combinations of divergence and polymorphism. We then consider “new frontiers” in this field stemming from recent research on transcriptional regulation.

Quantifying the effects of anagenetic and cladogenetic evolution

Quantifying the effects of anagenetic and cladogenetic evolution
Krzysztof Bartoszek

An ongoing debate in evolutionary biology is whether phenotypic change occurs predominantly around the time of speciation or whether it instead accumulates gradually over time. In this work I propose a general framework incorporating both types of change, quantify the effects of speciational change via the correlation between species and attribute the proportion of change to each type. I discuss results of parameter estimation of Hominoid body size in this light. I derive mathematical formulae related to this problem, the probability generating functions of the number of speciation events along a randomly drawn lineage and from the most recent common ancestor of two randomly chosen tip species for a conditioned Yule tree. Additionally I obtain in closed form the variance of the distance from the root to the most recent common ancestor of two randomly chosen tip species.

The largest strongly connected component in Wakeley et al’s cyclical pedigree model

The largest strongly connected component in Wakeley et al’s cyclical pedigree model

Jochen Blath, Stephan Kadow, Marcel Ortgiese
Comments: 21 pages, 2 figures
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

We establish a link between Wakeley et al’s (2012) cyclical pedigree model from population genetics and a randomized directed configuration model (DCM) considered by Cooper and Frieze (2004). We then exploit this link in combination with asymptotic results for the in-degree distribution of the corresponding DCM to compute the asymptotic size of the largest strongly connected component $S^N$ (where $N$ is the population size) of the DCM resp. the pedigree. The size of the giant component can be characterized explicitly (amounting to approximately $80 \%$ of the total populations size) and thus contributes to a reduced `pedigree effective population size’. In addition, the second largest strongly connected component is only of size $O(\log N)$. Moreover, we describe the size and structure of the `domain of attraction’ of $S^N$. In particular, we show that with high probability for any individual the shortest ancestral line reaches $S^N$ after $O(\log \log N)$ generations, while almost all other ancestral lines take at most $O(\log N)$ generations.

Tractable stochastic models of evolution for loosely linked loci

Tractable stochastic models of evolution for loosely linked loci
Paul A. Jenkins, Paul Fearnhead, Yun S. Song
Comments: 32 pages, 1 figure
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

Of fundamental importance in statistical genetics is to compute the sampling distribution, or likelihood, for a sample of genetic data from some stochastic evolutionary model. For DNA sequence data with inter-locus recombination, standard models include the Wright-Fisher diffusion with recombination and its dual genealogical process, the ancestral recombination graph. However, under neither of these models is the sampling distribution available in closed-form, and their computation is extremely difficult. In this paper we derive two new stochastic population genetic models, one a diffusion and the other a coalescent process, which are much simpler than the standard models, but which capture their key properties for large recombination rates. In the former case, we show that the sampling distribution is available in closed form. We further demonstrate that when we consider the sampling distribution as an asymptotic expansion in inverse powers of the recombination parameter, the sampling distributions of the two models agree with the standard ones up to the first two orders.

Dynamics of a combined medea-underdominant population transformation system

Dynamics of a combined medea-underdominant population transformation system
Chaitanya Gokhale, Richard Guy Reeves, Floyd A Reed

Transgenic constructs intended to be stably established at high frequencies in wild popu- lations have been demonstrated to “drive” from low frequencies in experimental insect populations. Link- ing such population transformation constructs to genes which render them unable to transmit pathogens could eventually be used to stop the spread of vector-borne diseases like malaria and dengue. Generally, population transformation constructs with only a single transgenic drive mechanism have been envisioned. Using a theoretical modelling approach we describe the predicted properties of a construct combining autosomal Medea and underdominant population transformation systems. We show that when combined they can exhibit synergistic properties which in broad circumstances surpass those of the single systems. With combined systems, intentional population transformation and its reversal can be achieved readily. Combined constructs also enhance the capacity to geographically restrict transgenic constructs to targeted populations. It is anticipated that these properties are likely to be of particular value in attracting regulatory approval and public acceptance of this novel technology.

Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2

Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2
Michael I Love, Wolfgang Huber, Simon Anders

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

Lighter: fast and memory-efficient error correction without counting

Lighter: fast and memory-efficient error correction without counting
Li Song, Liliana Florea, Ben Langmead

Lighter is a fast and memory-efficient tool for correcting sequencing errors in high-throughput sequencing datasets. Lighter avoids counting k-mers in the sequencing reads. Instead, it uses a pair of Bloom filters, one populated with a sample of the input k-mers and the other populated with k-mers likely to be correct based on a simple test. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, the Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is easily applied to very large sequencing datasets. It is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy. Lighter is free open source software available from https://github.com/mourisl/Lighter/.

Reconstructing Austronesian population history in Island Southeast Asia

Reconstructing Austronesian population history in Island Southeast Asia
Mark Lipson, Po-Ru Loh, Nick Patterson, Priya Moorjani, Ying-Chin Ko, Mark Stoneking, Bonnie Berger, David Reich

Austronesian languages are spread across half the globe, from Easter Island to Madagascar. Evidence from linguistics and archaeology indicates that the “Austronesian expansion,” which began 4-5 thousand years ago, likely had roots in Taiwan, but the ancestry of present-day Austronesian-speaking populations remains controversial. Here, focusing primarily on Island Southeast Asia, we analyze genome-wide data from 56 populations using new methods for tracing ancestral gene flow. We show that all sampled Austronesian groups harbor ancestry that is more closely related to aboriginal Taiwanese than to any present-day mainland population. Surprisingly, western Island Southeast Asian populations have also inherited ancestry from a source nested within the variation of present-day populations speaking Austro-Asiatic languages, which have historically been nearly exclusive to the mainland. Thus, either there was once a substantial Austro-Asiatic presence in Island Southeast Asia, or Austronesian speakers migrated to and through the mainland, admixing there before continuing to western Indonesia.