Simple genetic models for autism spectrum disorder
Swagatam Mukhopadhyay , Michael Wigler , Dan Levy
To explore the interplay between new mutation, transmission, and gender bias in genetic disease requires formal quantitative modeling. Autism spectrum disorders offer an ideal case: they are genetic in origin, complex, and show a gender bias. The high reproductive costs of autism ensure that most strongly associated genetic mutations are short-lived, and indeed the disease exhibits both transmitted and de novo components. There is a large body of both epidemiologic and genomic data that greatly constrain the genetic mechanisms that may contribute to the disorder. We develop a computational framework that assumes classes of additive variants, each member of a class having equal effect. We restrict our initial exploration to single class models, each having three parameters. Only one model matches epidemiological data. It also independently matches the incidence of de novo mutation in simplex families, the gender bias in unaffected siblings in simplex populations, and rates of mutation in target genes. This model makes strong and as yet not fully tested predictions, namely that females are the primary carriers in cases of genetic transmission, and that the incidence of de novo mutation in target genes for families at high risk for autism are not especially elevated. In its simplicity, this model does not account for MZ twin concordance or the distorted gender bias of high functioning children with ASD, and does not accommodate all the known mechanisms contributing to ASD. We point to the next steps in applying the same computational framework to explore more complex models.
Genetic variability under the seed bank coalescent
Jochen Blath , Bjarki Eldon , Adrian Casanova , Noemi Kurt , Maite Wilke-Berenguer
We analyse patterns of genetic variability of populations in the presence of a large seed bank with the help of a new coalescent structure called the seed bank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seed banks, if the seed bank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson processes on the active lineages, and potentially at reduced rate also on the dormant lineages. The presence of `dormant’ lineages leads to qualitatively altered times to the most recent common ancestor and non-classical patterns of genetic diversity. To illustrate this we provide a Wright-Fisher model with seed bank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seed bank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seed bank, are compared. The effect of a seed bank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seed bank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect the presence of a large seed bank in genetic data.
Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation
Jinmyung Choi , Parisa Shooshtari , Kaitlin E Samocha , Mark J Daly , Chris Cotsapas
AbstractInfo/HistoryMetrics Preview PDF
Using robust, integrated analysis of multiple genomic datasets, we show that genes depleted for non-synonymous de novo mutations form a subnetwork of 72 members under strong selective constraint. We further show this subnetwork is preferentially expressed in the early development of the human hippocampus and is enriched for genes mutated in neurological, but not other, Mendelian disorders. We thus conclude that carefully orchestrated developmental processes are under strong constraint in early brain development, and perturbations caused by mutation have adverse outcomes subject to strong purifying selection. Our findings demonstrate that selective forces can act on groups of genes involved in the same process, supporting the notion that adaptation can act coordinately on multiple genes. Our approach provides a statistically robust, interpretable way to identify the tissues and developmental times where groups of disease genes are active. Our findings highlight the importance of considering the interactions between genes when analyzing genome-wide sequence data.
RiboDiff: Detecting Changes of Translation Efficiency from Ribosome Footprints
Yi Zhong , Theofanis Karaletsos , Philipp Drewe , Vipin Thankam T Sreedharan , Kamini Singh , Hans-Guido Wendel , Gunnar Rätsch
Motivation: Deep sequencing based ribosome footprint profiling can provide novel insights into the regulatory mechanisms of protein translation. However, the observed ribosome profile is fundamentally confounded by transcriptional activity. In order to decipher principles of translation regulation, tools that can reliably detect changes in translation efficiency in case-control studies are needed. Results: We present a statistical framework and analysis tool, RiboDiff, to detect genes with changes in translation efficiency across experimental treatments. RiboDiff uses generalized linear models to estimate the over-dispersion of RNA-Seq and ribosome profiling measurements separately, and performs a statistical test for differential translation efficiency using both mRNA abundance and ribosome occupancy. Availability: Source code and documentation are available at http://github.com/ratschlab/ribodiff. Supplementary Material can be found at http://bioweb.me/ribo.
New Routes to Phylogeography
Nicola De Maio, Chieh-Hsi Wu, Kathleen M O’Reilly, Daniel Wilson
(Submitted on 27 Mar 2015)
Phylogeographic methods aim to infer migration trends and the history of sampled lineages from genetic data. Applications of phylogeography are broad, and in the context of pathogens include the reconstruction of transmission histories and the origin and emergence of outbreaks. Phylogeographic inference based on bottom-up population genetics models is computationally expensive, and as a result faster alternatives based on the evolution of discrete traits have become popular. In this paper, we show that inference of migration rates and root locations based on discrete trait models is extremely unreliable and sensitive to biased sampling. To address this problem, we introduce BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations. We illustrate the potentially severe implications of poor model choice for phylogeographic analyses by investigating the zoonotic transmission of Ebola virus. Whereas the structured coalescent analysis correctly infers that successive human Ebola outbreaks have been seeded by a large unsampled non-human reservoir population, the discrete trait analysis implausibly concludes that undetected human-to-human transmission has allowed the virus to persist over the past four decades. As genomics takes on an increasingly prominent role informing the control and prevention of infectious diseases, it will be vital that phylogeographic inference provides robust insights into transmission history.
Long live the alien: studying the fate of the genomic diversity along the long-term dynamics of an extremely successful invader, the crested porcupine.
Emiliano Trucchi , Benoit Facon , Paolo Gratton , Emiliano Mori , Nils Chr Stenseth , Sissel Jentoft
Describing long-term evolutionary trajectories of alien species is a fundamental, although rarely possible, step to understand the pivotal drivers of successful invasions. Here, we tackled this task by investigating the genetic structure of the crested porcupine (Hystrix cristata), whose invasion of Italy started about 1500 years ago. Using genome-wide RAD markers, we explored the demographic processes that shaped, and are shaping, the gene pool of the expanding invasive populations and compared their genetic diversity with that of native and invasive populations of both African porcupine species (crested and Cape, H. africaeaustralis). Through coalescence-based demographic reconstructions, we demonstrated that bottleneck at introduction was mild and did not severely affect the reservoir of genetic diversity. Our data also highlighted a marked geographic structure in the invasive populations, indicating that they are likely the results of multiple introduction events. Nevertheless, both the invasive populations and its source show a lower level of diversity relative to other native populations from Sub-Saharan and South Africa, suggesting that demographic history before introduction may have played a role in forging a successful invader. Finally, we showed that the current spatial expansion at the northern boundary of the range is following a leading-edge model characterized by a general reduction of genetic diversity towards the edge of the expanding range. Consistently, random fixation of alleles through gene-surfing seems a more likely explanation than adaptive divergence for the distribution of the few outlier loci with highly divergent frequencies between core and newly colonized areas.
A simple biophysical model predicts more rapid accumulation of hybrid incompatibilities in small populations
Bhavin S. Khatri, Richard A. Goldstein
Comments: 13 pages, 6 figures
Subjects: Populations and Evolution (q-bio.PE)
Speciation is fundamental to the huge diversity of life on Earth. Evidence suggests reproductive isolation arises most commonly in allopatry with a higher speciation rate in small populations. Current theory does not address this dependence in the important weak mutation regime. Here, we examine a biophysical model of speciation based on the binding of a protein transcription factor to a DNA binding site, and how their independent co-evolution, in a stabilizing landscape, of two allopatric lineages leads to incompatibilities. Our results give a new prediction for the monomorphic regime of evolution, consistent with data, that smaller populations should develop incompatibilities more quickly. This arises as: 1) smaller populations having a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space; 2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences, that does not require peak-shifts or positive selection.
Analysis of adaptive walks on NK fitness landscapes with different interaction schemes
Stefan Nowak, Joachim Krug
Comments: 29 pages, 9 figures
Subjects: Populations and Evolution (q-bio.PE); Disordered Systems and Neural Networks (cond-mat.dis-nn)
Fitness landscapes are genotype to fitness mappings commonly used in evolutionary biology and computer science which are closely related to spin glass models. In this paper, we study the NK model for fitness landscapes where the interaction scheme between genes can be explicitly defined. The focus is on how this scheme influences the overall shape of the landscape. Our main tool for the analysis are adaptive walks, an idealized dynamics by which the population moves uphill in fitness and terminates at a local fitness maximum. We use three different types of walks and investigate how their length (the number of steps required to reach a local peak) and height (the fitness at the endpoint of the walk) depend on the dimensionality and structure of the landscape. We find that the distribution of local maxima over the landscape is particularly sensitive to the choice of interaction pattern. Most quantities that we measure are simply correlated to the rank of the scheme, which is equal to the number of nonzero coefficients in the expansion of the fitness landscape in terms of Walsh functions.
Entire genome transcription across evolutionary time exposes non-coding DNA to de novo gene emergence
Rafik Neme , Diethard Tautz
Even in the best studied Mammalian genomes, less than 5% of the total genome length is annotated as exonic. However, deep sequencing analysis in humans has shown that around 40% of the genome may be covered by poly-adenylated non-coding transcripts occurring at low levels. Their functional significance is unclear, and there has been a dispute whether they should be considered as noise of the transcriptional machinery. We propose that if such transcripts show some evolutionary stability they will serve as substrates for de novo gene evolution, i.e. gene emergence out of non-coding DNA. Here, we characterize the phylogenetic turnover of low-level poly-adenylated transcripts in a comprehensive sampling of populations, sub-species and species of the genus Mus, spanning a phylogenetic distance of about 10 Myr. We find evidence for more evolutionary stable gains of transcription than losses among closely related taxa, balanced by a loss of older transcripts across the whole phylogeny. We show that adding taxa increases the genomic transcript coverage and that no major transcript-free islands exist over time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. Thus, any part of the “non-coding” genome can become subject to evolutionary functionalization via de novo gene evolution.
MMR: A Tool for Read Multi-Mapper Resolution
Andre Kahles , Jonas Behr , Gunnar Rätsch
Motivation: Mapping high throughput sequencing data to a reference genome is an essential step for most analysis pipelines aiming at the computational analysis of genome and transcriptome sequencing data. Breaking ties between equally well mapping locations poses a severe problem not only during the alignment phase, but also has significant impact on the results of downstream analyses. We present the multimapper resolution (MMR) tool that infers optimal mapping locations from the coverage density of other mapped reads. Results: Filtering alignments with MMR can significantly improve the performance of downstream analyses like transcript quantitation and differential testing. We illustrate that the accuracy (Spearman correlation) of transcript quantification increases by 17% when using reads of length 51. In addition, MMR decreases the alignment file sizes by more than 50% and this leads to a reduced running time of the quantification tool. Our efficient implementation of the MMR algorithm is easily applicable as a post-processing step to existing alignment files in BAM format. Its complexity scales linearly with the number of alignments and requires no further inputs. Supplementary Material: Source code and documentation are available for download at http://github.com/ratschlab/mmr. Supplementary text and figures, comprehensive testing results and further information can be found at http://bioweb.me/mmr.