Population genetics on islands connected by an arbitrary network: An analytic approach

Population genetics on islands connected by an arbitrary network: An analytic approach
George W A Constable, Alan J McKane
(Submitted on 11 Feb 2014)

We analyse a model consisting of a population of individuals which is subdivided into a finite set of demes, each of which has a fixed but differing number of individuals. The individuals can reproduce, die and migrate between the demes according to an arbitrary migration network. They are haploid, with two alleles present in the population; frequency independent selection is also incorporated, where the strength and direction of selection can vary from deme to deme. The system is formulated as an individual-based model, and the diffusion approximation systematically applied to express it as a set of nonlinear coupled stochastic differential equations. These can be made amenable to analysis through the elimination of fast-time variables. The resulting reduced model is analysed in a number of situations, including migration-selection balance leading to a polymorphic equilibrium of the two alleles, and an illustration of how the subdivision of the population can lead to non-trivial behaviour in the case where the network is a simple hub. The method we develop is systematic, may be applied to any network, and agrees well with the results of simulations in all cases studied and across a wide range of parameter values.

Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora

Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora
Robert Williamson, Emily B Josephs, Adrian E Platts, Khaled M Hazzouri, Annabelle Haudry, Mathieu Blanchette, Stephen I Wright

The extent that both positive and negative selection vary across different portions of plant genomes remains poorly understood. Here we sequence whole genomes of 13 Capsella grandiflora individuals and quantify the amount of selection across the genome. Using an estimate of the distribution of fitness effects we show that selection is strong in coding regions, but weak in most noncoding regions with the exception of 5’ and 3’ untranslated regions (UTRs). However, estimates of selection in noncoding regions conserved across the Brassicaceae family show strong signals of selection. Additionally, we see reductions in neutral diversity around functional substitutions in both coding and conserved noncoding regions, indicating recent selective sweeps at these sites. Finally, using expression data from leaf tissue we show that genes that are more highly expressed experience stronger negative selection but comparable levels of positive selection to lowly expressed genes.

Discovering functional DNA elements using population genomic information: A proof of concept using human mtDNA

Discovering functional DNA elements using population genomic information: A proof of concept using human mtDNA
Daniel R. Schrider, Andrew D. Kern
Subjects: Populations and Evolution (q-bio.PE); Genomics (q-bio.GN)

Identifying the complete set of functional elements within the human genome would be a windfall for multiple areas of biological research including medicine, molecular biology, and evolution. Complete knowledge of function would aid in the prioritization of loci when searching for the genetic bases of disease or adaptive phenotypes. Because mutations that disrupt function are disfavored by natural selection, purifying selection leaves a detectable signature within functional elements; accordingly this signal has been exploited through the use of genomic comparisons of distantly related species. However, the functional complement of the genome changes extensively across time and between lineages, therefore, evidence of the current action of purifying selection is essential. Because the removal of deleterious mutations by natural selection also reduces within-species genetic diversity within functional loci, dense population genetic data have the potential to reveal genomic elements that are currently functional. Here we assess the potential of this approach using 16,411 human mitochondrial genomes. We show that the high density of polymorphism in this dataset precisely delineates regions experiencing purifying selection. Further, we show that the number of segregating alleles at a site is strongly correlated with its divergence across species after accounting for known mutational biases in human mtDNA. These two measures track one another at a remarkably fine scale across many loci–a correlation that is purely the result of natural selection. Our results demonstrate that genetic variation has the potential to reveal exactly which nucleotides in the genome are currently performing important functions and likely to have deleterious fitness effects when mutated. As more complete genomes are sequenced, similar power to reveal purifying selection may be achievable in the human nuclear genome.

The fixation time of a strongly beneficial allele in a structured population


The fixation time of a strongly beneficial allele in a structured population

Andreas Greven, Peter Pfaffelhuber, Cornelia Pokalyuk, Anton Wakolbinger
Comments: 41 pages, 4 figures
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

For a beneficial allele which enters a large unstructured population and eventually goes to fixation, it is known that the time to fixation is approximately $2\log(\alpha)/\alpha$ for a large selection coefficent $\alpha$. In the presence of spatial structure with migration between colonies we detect various regimes of the migration rate $\mu$ for which the fixation times have different asymptotics as $\alpha \to \infty$. If $\mu$ is of order $\alpha$, the allele fixes (as in the spatially unstructured case) in time $\sim 2\log(\alpha)/\alpha$. If $\mu$ is of order $\alpha^p, 0\leq p \leq 1$, the fixation time is $\sim (2 + (1-p)d) \log(\alpha)/\alpha$, where $d$ is the maximum of the migration steps that are required from the colony where the beneficial allele entered to any other colony. If $\mu = 1/\log(\alpha)$, the fixation time is $\sim (2+S)\log(\alpha)/\alpha$, where $S$ is a random time in a simple epidemic model. The main idea for our analysis is to combine a new moment dual for the process conditioned to fixation with the time reversal in equilibrium of a spatial version of Neuhauser and Krone’s ancestral selection graph.

Extensive epistasis within the MHC contributes to the genetic architecture of celiac disease

Extensive epistasis within the MHC contributes to the genetic architecture of celiac disease
Ben Goudey, Gad Abraham, Eder Kikianty, Qiao Wang, Dave Rawlinson, Fan Shi, Izhak Haviv, Linda Stern, Adam Kowalczyk, Michael Inouye

Epistasis has long been thought to contribute to the genetic aetiology of complex diseases, yet few robust epistatic interactions in humans have been detected. We have conducted exhaustive genome-wide scans for pairwise epistasis in five independent celiac disease (CeD) case-control studies, using a rapid model-free approach to examine over 500 billion SNP pairs in total. We found extensive epistasis within the MHC region with 7,270 statistically significant pairs achieving stringent replication criteria across multiple studies. These robust epistatic pairs partially tagged CeD risk HLA haplotypes, and replicable evidence for epistatic SNPs outside the MHC was not observed. Both within and between European populations, we observed striking consistency of epistatic models and epistatic model distribution, thus providing empirical estimates of their frequencies in a complex disease. Within the UK population, models of CeD comprised of both epistatic and additive single-SNP effects increased explained CeD variance by approximately 1% over those of single SNPs. Further analysis showed that additive SNP effects tag epistatic effects (and vice versa), sometimes involving SNPs separated by a megabase or more. These findings show that the genetic architecture of CeD consists of overlapping additive and epistatic components, indicating that the genetic architecture of CeD, and potentially other common autoimmune diseases, is more complex than previously thought.

The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies

The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies
Jordan Fish, Daniel R O’Donnell, Abhijna Parigi, Ian Dworkin, Aaron P Wagner
Standing genetic variation and the historical environment in which that variation arises (evolutionary history) are both potentially significant determinants of a population’s capacity for evolutionary response to a changing environment. We evaluated the relative importance of these two factors in influencing the evolutionary trajectories in the face of sudden environmental change. We used the open-ended digital evolution software Avida to examine how historic exposure to predation pressures, different levels of genetic variation, and combinations of the two, impact anti-predator strategies and competitive abilities evolved in the face of threats from new, invasive, predator populations. We show that while standing genetic variation plays some role in determining evolutionary responses, evolutionary history has the greater influence on a population’s capacity to evolve effective anti-predator traits. This adaptability likely reflects the relative ease of repurposing existing, relevant genes and traits, and the broader potential value of the generation and maintenance of adaptively flexible traits in evolving populations.

Genome-Wide Introgression Revealed Pervasive Hybrid Incompatibilities (HI) between Caenorhabditis species

Genome-Wide Introgression Revealed Pervasive Hybrid Incompatibilities (HI) between Caenorhabditis species
Yu Bi, Xiaoliang Ren, Cheung Yan, Jiaofang Shao, Dongying Xie, Zhongying Zhao

Systematic characterization of hybrid incompatibility (HI) between related species remains the key to understanding speciation. The genetic basis of HI has been intensively studied in Drosophila species, but remains largely unknown in other species, including nematodes. This is mainly due to the lack of a sister species with which C. elegans can mate and produce viable progeny. The recent discovery of a C. briggsae sister species, C. sp.9, opened up the possibility of dissecting the genetic basis of HI in nematode species. However, paucity of molecular and genetic tools has prevented the precise mapping of HI loci between the two species. To systematically isolate the HI loci between the nematode species pair, we first generated 96 chromosomally integrated, independent GFP insertions in the C. briggsae genome. We next mapped the GFP insertion site into defined locations using a method we had developed earlier. The dominant and visible markers facilitated the directional crossing of its linked genomic sequences into C. sp.9. We then backcrossed each individual marker into C. sp.9 for at least 15 generations and produced 111 independent introgression lines, which together represent most of the C. briggsae genome. We finally dissected the HI patterns by scoring embryonic lethality, larval arrest, sex ratio, fertility, male sterility and inviability in a subset of the introgression lines, and identified pervasive HIs between the two species. The study produced a genome-wide landscape of HI between nematode species for the first time. The initial crossing results confirmed the Haldane?s rule and the fertility data from homozygous introgressions supported the rule of large X effect. The large collection of introgression lines allows mapping of numerous HI loci into defined genomic regions between C. briggsae and C. sp.9, thus facilitating further characterization of their genetic and molecular mechanisms. Importantly, the study permits comparative analysis of speciation genetics between nematodes and other species.

Investigating speciation in face of polyploidization: what can we learn from approximate Bayesian computation approach?

Investigating speciation in face of polyploidization: what can we learn from approximate Bayesian computation approach?
Camille Roux, John Pannell

Despite its importance in the diversification of many eucaryote clades, particularly plants, detailed genomic analysis of polyploid species is still in its infancy, with published analysis of only a handful of model species to date. Fundamental questions concerning the origin of polyploid lineages (e.g., auto- vs. allopolyploidy) and the extent to which polyploid genomes display different modes of inheritance are poorly resolved for most polyploids, not least because they have hitherto required detailed karyotypic analysis or the analysis of allele segregation at multiple loci in pedigrees or artificial crosses, which are often not practical for non-model species. However, the increasing availability of sequence data for non-model species now presents an opportunity to apply established approaches for the evolutionary analysis of genomic data to polyploid species complexes. Here, we ask whether approximate Bayesian computation (ABC), applied to sequence data produced by next-generation sequencing technologies from polyploid taxa, allows correct inference of the evolutionary and demographic history of polyploid lineages and their close relatives. We use simulations to investigate how the number of sampled individuals, the number of surveyed loci and their length affect the accuracy and precision of evolutionary and demographic inferences by ABC, including the mode of polyploidisation, mode of inheritance of polyploid taxa, the relative timing of genome duplication and speciation, and effective populations sizes of contributing lineages. We also apply the ABC framework we develop to sequence data from diploid and polyploidy species of the plant genus Capsella, for which we infer an allopolyploid origin for tetra C. bursa-pastoris ≈ 90,000 years ago. In general, our results indicate that ABC is a promising and powerful method for uncovering the origin and subsequent evolution of polyploid species.

Cross-phenotype meta-analysis reveals large-scale trans-eQTLs mediating patterns of transcriptional co-regulation

Cross-phenotype meta-analysis reveals large-scale trans-eQTLs mediating patterns of transcriptional co-regulation
Boel Brynedal, Towfique Raj, Barbara E Stranger, Robert Bjornson, Benjamin M Neale, Benjamin F Voight, Chris Cotsapas
(Submitted on 7 Feb 2014)

Genetic variation affecting gene regulation is a central driver of phenotypic differences between individuals and can be used to uncover how biological processes are organized in a cell. Although detecting cis-eQTLs is now routine, trans-eQTLs have proven more challenging to find due to the modest variance explained and the multiple tests burden of testing millions of SNPs for association to thousands of transcripts. Here, we successfully map trans-eQTLs with the complementary approach of looking for SNPs associated to the expression of multiple genes simultaneously. We find 732 trans- eQTLs that replicate across two continental populations; each trans-eQTL controls large groups of target transcripts (regulons), which are part of interacting networks controlled by transcription factors. We are thus able to uncover co-regulated gene sets and begin describing the cell circuitry of gene regulation.

The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

Ard A Louis, Steffen Schaper
(Submitted on 6 Feb 2014)

Genotype-phenotype (GP) maps specify how the random mutations that change genotypes generate variation by altering phenotypes, which, in turn, can trigger selection. Many GP maps share the following general properties: 1) The number of genotypes NG is much larger than the number of selectable phenotypes; 2) Neutral exploration changes the variation that is accessible to the population; 3) The distribution of phenotype frequencies Fp=Np/NG, with Np the number of genotypes mapping onto phenotype p, is highly biased: the majority of genotypes map to only a small minority of the phenotypes. Here we explore how these properties affect the evolutionary dynamics of haploid Wright-Fisher models that are coupled to a simplified and general random GP map or to a more complex RNA sequence to secondary structure map. For both maps the probability of a mutation leading to a phenotype p scales to first order as Fp, although for the RNA map there are further correlations as well. By using mean-field theory, supported by computer simulations, we show that the discovery time Tp of a phenotype p similarly scales to first order as 1/Fp for a wide range of population sizes and mutation rates in both the monomorphic and polymorphic regimes. These differences in the rate at which variation arises can vary over many orders of magnitude. Phenotypic variation with a larger Fp is therefore be much more likely to arise than variation with a small Fp. We show, using the RNA model, that frequent phenotypes (with larger Fp) can fix in a population even when alternative, but less frequent, phenotypes with much higher fitness are potentially accessible. In other words, if the fittest never `arrive’ on the timescales of evolutionary change, then they can’t fix. We call this highly non-ergodic effect the `arrival of the frequent’.