Genome-Wide Introgression Revealed Pervasive Hybrid Incompatibilities (HI) between Caenorhabditis species

Genome-Wide Introgression Revealed Pervasive Hybrid Incompatibilities (HI) between Caenorhabditis species
Yu Bi, Xiaoliang Ren, Cheung Yan, Jiaofang Shao, Dongying Xie, Zhongying Zhao

Systematic characterization of hybrid incompatibility (HI) between related species remains the key to understanding speciation. The genetic basis of HI has been intensively studied in Drosophila species, but remains largely unknown in other species, including nematodes. This is mainly due to the lack of a sister species with which C. elegans can mate and produce viable progeny. The recent discovery of a C. briggsae sister species, C. sp.9, opened up the possibility of dissecting the genetic basis of HI in nematode species. However, paucity of molecular and genetic tools has prevented the precise mapping of HI loci between the two species. To systematically isolate the HI loci between the nematode species pair, we first generated 96 chromosomally integrated, independent GFP insertions in the C. briggsae genome. We next mapped the GFP insertion site into defined locations using a method we had developed earlier. The dominant and visible markers facilitated the directional crossing of its linked genomic sequences into C. sp.9. We then backcrossed each individual marker into C. sp.9 for at least 15 generations and produced 111 independent introgression lines, which together represent most of the C. briggsae genome. We finally dissected the HI patterns by scoring embryonic lethality, larval arrest, sex ratio, fertility, male sterility and inviability in a subset of the introgression lines, and identified pervasive HIs between the two species. The study produced a genome-wide landscape of HI between nematode species for the first time. The initial crossing results confirmed the Haldane?s rule and the fertility data from homozygous introgressions supported the rule of large X effect. The large collection of introgression lines allows mapping of numerous HI loci into defined genomic regions between C. briggsae and C. sp.9, thus facilitating further characterization of their genetic and molecular mechanisms. Importantly, the study permits comparative analysis of speciation genetics between nematodes and other species.

Investigating speciation in face of polyploidization: what can we learn from approximate Bayesian computation approach?

Investigating speciation in face of polyploidization: what can we learn from approximate Bayesian computation approach?
Camille Roux, John Pannell

Despite its importance in the diversification of many eucaryote clades, particularly plants, detailed genomic analysis of polyploid species is still in its infancy, with published analysis of only a handful of model species to date. Fundamental questions concerning the origin of polyploid lineages (e.g., auto- vs. allopolyploidy) and the extent to which polyploid genomes display different modes of inheritance are poorly resolved for most polyploids, not least because they have hitherto required detailed karyotypic analysis or the analysis of allele segregation at multiple loci in pedigrees or artificial crosses, which are often not practical for non-model species. However, the increasing availability of sequence data for non-model species now presents an opportunity to apply established approaches for the evolutionary analysis of genomic data to polyploid species complexes. Here, we ask whether approximate Bayesian computation (ABC), applied to sequence data produced by next-generation sequencing technologies from polyploid taxa, allows correct inference of the evolutionary and demographic history of polyploid lineages and their close relatives. We use simulations to investigate how the number of sampled individuals, the number of surveyed loci and their length affect the accuracy and precision of evolutionary and demographic inferences by ABC, including the mode of polyploidisation, mode of inheritance of polyploid taxa, the relative timing of genome duplication and speciation, and effective populations sizes of contributing lineages. We also apply the ABC framework we develop to sequence data from diploid and polyploidy species of the plant genus Capsella, for which we infer an allopolyploid origin for tetra C. bursa-pastoris ≈ 90,000 years ago. In general, our results indicate that ABC is a promising and powerful method for uncovering the origin and subsequent evolution of polyploid species.

Cross-phenotype meta-analysis reveals large-scale trans-eQTLs mediating patterns of transcriptional co-regulation

Cross-phenotype meta-analysis reveals large-scale trans-eQTLs mediating patterns of transcriptional co-regulation
Boel Brynedal, Towfique Raj, Barbara E Stranger, Robert Bjornson, Benjamin M Neale, Benjamin F Voight, Chris Cotsapas
(Submitted on 7 Feb 2014)

Genetic variation affecting gene regulation is a central driver of phenotypic differences between individuals and can be used to uncover how biological processes are organized in a cell. Although detecting cis-eQTLs is now routine, trans-eQTLs have proven more challenging to find due to the modest variance explained and the multiple tests burden of testing millions of SNPs for association to thousands of transcripts. Here, we successfully map trans-eQTLs with the complementary approach of looking for SNPs associated to the expression of multiple genes simultaneously. We find 732 trans- eQTLs that replicate across two continental populations; each trans-eQTL controls large groups of target transcripts (regulons), which are part of interacting networks controlled by transcription factors. We are thus able to uncover co-regulated gene sets and begin describing the cell circuitry of gene regulation.

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
Dennis Kostka, Tara Friedrich, Alisha K. Holloway, Katherine S. Pollard
(Submitted on 1 Feb 2014)

Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.

Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans

Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans
Rebekah L. Rogers, Julie M. Cridland, Ling Shao, Tina T. Hu, Peter Andolfatto, Kevin R. Thornton
(Submitted on 28 Jan 2014)

We have used whole genome paired-end Illumina sequence data to identify tandem duplications in 20 isofemale lines of D. yakuba, and 20 isofemale lines of D. simulans and performed genome wide validation with PacBio long molecule sequencing. We identify 1,415 tandem duplications that are segregating in D. yakuba as well as 975 duplications in D. simulans, indicating greater variation in D. yakuba. Additionally, we observe high rates of secondary deletions at duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites in D. yakuba modified with deletions. These secondary deletions are consistent with the action of the large loop mismatch repair system acting to remove polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in duplicated alleles and a richer substrate of genetic novelty than has been previously reported. Most duplications are present in only single strains, suggesting deleterious impacts are common. However, we do observe signals consistent with adaptive evolution. D. simulans shows an excess of whole gene duplications and an excess of high frequency variants on the X chromosome, consistent with adaptive evolution through duplications on the D. simulans X. We identify 79 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans, as well as 143 cases of recruited non-coding sequence in D. yakuba and 96 in D. simulans, in agreement with rates of chimeric gene origination in D. melanogaster. Together, these results suggest that tandem duplications often result in complex variation beyond whole gene duplications that offers a rich substrate of standing variation that is likely to contribute both to detrimental phenotypes and disease, as well as to adaptive evolutionary change.

Footprints of ancient balanced polymorphisms in genetic variation data

Footprints of ancient balanced polymorphisms in genetic variation data
Ziyue Gao, Molly Przeworski, Guy Sella
(Submitted on 29 Jan 2014)

When long-lived, balancing selection can lead to trans-species polymorphisms that are shared by two or more species identical by descent. In this case, the gene genealogies at the selected sites cluster by allele instead of by species and, because of linkage, nearby neutral sites also have unusual genealogies. Although it is clear that this scenario should lead to discernible footprints in genetic variation data, notably the presence of additional neutral polymorphisms shared between species and the absence of fixed differences, the effects remain poorly characterized. We focus on the case of a single site under long-lived balancing selection and derive approximations for summaries of the data that are sensitive to a trans-species polymorphism: the length of the segment that carries most of the signals, the expected number of shared neutral SNPs within the segment and the patterns of allelic associations among them. Coalescent simulations of ancient balancing selection confirm the accuracy of our approximations. We further show that for humans and chimpanzees, and more generally for pairs of species with low genetic diversity levels, the patterns of genetic variation on which we focus are highly unlikely to be generated by neutral recurrent mutations, so these statistics are specific as well as sensitive. We discuss the implications of our results for the design and interpretation of genome scans for ancient balancing selection in apes and other taxa.

Estimate of Within Population Incremental Selection Through Branch Imbalance in Lineage Trees

Estimate of Within Population Incremental Selection Through Branch Imbalance in Lineage Trees
Gilad Liberman, Jennifer Benichou, Lea Tsaban, yaakov maman, Jacob Glanville, yoram louzoun

Incremental selection within a population, defined as a limited fitness change following a mutation, is an important aspect of many evolutionary processes and can significantly affect a large number of mutations through the genome. Strongly advantageous or deleterious mutations are detected through the fixation of mutations in the population, using the synonymous to non-synonymous mutations ratio in sequences. There are currently to precise methods to estimate incremental selection occurring over limited periods. We here provide for the first time such a detailed method and show its precision and its applicability to the genomic analysis of selection. A special case of evolution is rapid, short term micro-evolution, where organism are under constant adaptation, occurring for example in viruses infecting a new host, B cells mutating during a germinal center reactions or mitochondria evolving within a given host. The proposed method is a novel mixed lineage tree/sequence based method to detect within population selection as defined by the effect of mutations on the average number of offspring. Specifically, we pro-pose to measure the log of the ratio between the number of leaves in lineage trees branches following synonymous and non-synonymous mutations. This method does not suffer from the need of a baseline model and is practically not affected by sampling biases. In order to show the wide applicability of this method, we apply it to multiple cases of micro-evolution, and show that it can detect genes and inter-genic regions using the selection rate and detect selection pressures in viral proteins and in the immune response to pathogens.

Demography and the age of rare variants

Demography and the age of rare variants
Iain Mathieson, Gil McVean
(Submitted on 16 Jan 2014)

Recently, large whole-genome sequencing projects have provided access to much of the rare variation in human populations. This variation is highly informative about population structure and recent demography. In this paper, we show how the age of rare variants can be estimated from patterns of haplotype sharing and how this information can detect and quantify historical relationships between populations. We investigate the distribution of the age of f2 variants in a worldwide sample sequenced by the 1,000 Genomes Project, revealing enormous variation across populations. The median age of f2 variants shared within continents is 50 to 160 generations for Europe and Asia, and 170 to 320 generations for Africa. Variants shared between continents are much older with median ages ranging from 320 to 670 generations between Europe and Asia, and 1,000 to 2,400 generations between African and Non-African populations. The distribution of the ages of variants shared across populations is informative about their demography, revealing recent bottlenecks, ancient splits, and more modern connections between populations. We see the signature of selection in the observation that functional variants are significantly younger than nonfunctional variants of the same frequency. This approach is relatively insensitive to mutation rate and complements other nonparametric methods for demographic inference.

Population genomics of Saccharomyces cerevisiae human isolates: passengers, colonizers, invaders.
Carlotta De Filippo, Monica Di Paola, Irene Stefanini, Lisa Rizzetto, Luisa Berná, Matteo Ramazzotti, Leonardo Dapporto, Damariz Rivero, Ivo G Gut, Marta Gut, Mónica Bayés, Jean-Luc Legras, Roberto Viola, Cristina Massi-Benedetti, Antonella De Luca, Luigina Romani, Paolo Lionetti, Duccio Cavalieri

The quest for the ecological niches of Saccharomyces cerevisiae ranged from wineries to oaks and more recently to the gut of Crabro Wasps. Here we propose the role of the human gut in shaping S. cerevisiae evolution, presenting the genetic structure of a previously unknown population of yeasts, associated with Crohn?s disease, providing evidence for clonal expansion within human?s gut. To understand the role of immune function in the human-yeast interaction we classified strains according to their immunomodulatory properties, discovering a set of genetically homogeneous isolates, capable of inducing anti-inflammatory signals via regulatory T cells proliferation, and on the contrary, a positive association between strain mosaicism and ability to elicit inflammatory, IL-17 driven, immune responses. The approach integrating genomics with immune phenotyping showed selection on genes involved in sporulation and cell wall remodeling as central for the evolution of S. cerevisiae Crohn?s strains from passengers to commensals to potential pathogens.

The existence and abundance of ghost ancestors in biparental populations


The existence and abundance of ghost ancestors in biparental populations

Simon Gravel, Mike Steel
(Submitted on 15 Jan 2014)

In a randomly-mating biparental population of size N there are, with high probability, individuals who are genealogical ancestors of every extant individual within approximately log2(N) generations into the past. We use this result of Chang to prove a curious corollary under standard models of recombination: there exist, with high probability, individuals within a constant multiple of log2(N) generations into the past who are simultaneously (i) genealogical ancestors of {\em each} of the individuals at the present, and (ii) genetic ancestors to {\em none} of the individuals at the present. Such ancestral individuals – ancestors of everyone today that left no genetic trace — represent `ghost’ ancestors in a strong sense. In this short note, we use simple analytical argument and simulations to estimate how many such individuals exist in Wright-Fisher populations.