The Spatial Mixing of Genomes in Secondary Contact Zones

The Spatial Mixing of Genomes in Secondary Contact Zones
Alisa Sedghifar , Yaniv Brandvain , Peter L. Ralph , Graham Coop
doi: http://dx.doi.org/10.1101/016337

Recent genomic studies have highlighted the important role of admixture in shaping genome-wide patterns of diversity. Past admixture leaves a population genomic signature of linkage disequilibrium (LD), reflecting the mixing of parental chromosomes by segregation and recombination. The extent of this LD can be used to infer the timing of admixture. However, the results of inference can depend strongly on the assumed demographic model. Here, we introduce a theoretical framework for modeling patterns of LD in a geographic contact zone where two differentiated populations are diffusing back together. We derive expressions for the expected LD and admixture tract lengths across geographic space as a function of the age of the contact zone and the dispersal distance of individuals. We develop an approach to infer age of contact zones using population genomic data from multiple spatially sampled populations by fitting our model to the decay of LD with recombination distance. We use our approach to explore the fit of a geographic contact zone model to three human population genomic datasets from populations along the Indonesian archipelago, populations in Central Asia and populations in India.

Eight thousand years of natural selection in Europe

Eight thousand years of natural selection in Europe
Iain Mathieson , Iosif Lazaridis , Nadin Rohland , Swapan Mallick , Bastien Llamas , Joseph Pickrell , Harald Meller , Manuel A. Rojo Guerra , Johannes Krause , David Anthony , Dorcas Brown , Carles Lalueza Fox , Alan Cooper , Kurt W. Alt , Wolfgang Haak , Nick Patterson , David Reich
doi: http://dx.doi.org/10.1101/016477

The arrival of farming in Europe beginning around 8,500 years ago required adaptation to new environments, pathogens, diets, and social organizations. While evidence of natural selection can be revealed by studying patterns of genetic variation in present-day people, these pattern are only indirect echoes of past events, and provide little information about where and when selection occurred. Ancient DNA makes it possible to examine populations as they were before, during and after adaptation events, and thus to reveal the tempo and mode of selection. Here we report the first genome-wide scan for selection using ancient DNA, based on 83 human samples from Holocene Europe analyzed at over 300,000 positions. We find five genome-wide signals of selection, at loci associated with diet and pigmentation. Surprisingly in light of suggestions of selection on immune traits associated with the advent of agriculture and denser living conditions, we find no strong sweeps associated with immunological phenotypes. We also report a scan for selection for complex traits, and find two signals of selection on height: for short stature in Iberia after the arrival of agriculture, and for tall stature on the Pontic-Caspian steppe earlier than 5,000 years ago. A surprise is that in Scandinavian hunter-gatherers living around 8,000 years ago, there is a high frequency of the derived allele at the EDAR gene that is the strongest known signal of selection in East Asians and that is thought to have arisen in East Asia. These results document the power of ancient DNA to reveal features of past adaptation that could not be understood from analyses of present-day people.

PoMo: An Allele Frequency-based Approach for Species Tree Estimation

PoMo: An Allele Frequency-based Approach for Species Tree Estimation
Nicola De Maio , Dominik Schrempf , Carolin Kosiol
doi: http://dx.doi.org/10.1101/016360

Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele-frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.

A Comparison of Methods to Measure Fitness in Escherichia coli

A Comparison of Methods to Measure Fitness in Escherichia coli
Michael J Wiser , Richard E Lenski
doi: http://dx.doi.org/10.1101/016121

In order to characterize the dynamics of adaptation, it is important to be able to quantify how a population’s mean fitness changes over time. Such measurements are especially important in experimental studies of evolution using microbes. The Long-Term Evolution Experiment (LTEE) with Escherichia coli provides one such system in which mean fitness has been measured by competing derived and ancestral populations. The traditional method used to measure fitness in the LTEE and many similar experiments, though, is subject to a potential limitation. As the relative fitness of the two competitors diverges, the measurement error increases because the less-fit population becomes increasingly small and cannot be enumerated as precisely. Here, we present and employ two alternatives to the traditional method. One is based on reducing the fitness differential between the competitors by using a common reference competitor from an intermediate generation that has intermediate fitness; the other alternative increases the initial population size of the less-fit, ancestral competitor. We performed a total of 480 competitions to compare the statistical properties of estimates obtained using these alternative methods with those obtained using the traditional method for samples taken over 50,000 generations from one of the LTEE populations. On balance, neither alternative method yielded measurements that were more precise than the traditional method.

svviz: a read viewer for validating structural variants

svviz: a read viewer for validating structural variants
Noah Spies , Justin M Zook , Marc Salit , Arend Sidow
doi: http://dx.doi.org/10.1101/016063

Visualizing read alignments is the most effective way to validate candidate SVs with existing data. We present svviz, a sequencing read visualizer for structural variants (SVs) that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele, and identifying reads that match one allele better than the other. Reads are assigned to the proper allele based on alignment score, read pair orientation and insert size. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared to the single reference genome-based view common to most current read browsers. The web view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations, and manual refinement of breakpoints. An optional command-line-only interface allows summary statistics and graphics to be exported directly to standard graphics file formats. svviz is open source and freely available from github, and requires as input only structural variant coordinates (called using any other software package), reads in bam format, and a reference genome. Reads from any high-throughput sequencing platform are supported, including Illumina short-read, mate-pair, synthetic long-read (assembled), Pacific Biosciences, and Oxford Nanopore. svviz is open source and freely available from https://github.com/svviz/svviz. 

The origins of a novel butterfly wing patterning gene from within a family of conserved cell cycle regulators

The origins of a novel butterfly wing patterning gene from within a family of conserved cell cycle regulators
Nicola Nadeau , Carolina Pardo-Diaz , Annabel Whibley , Megan Ann Supple , Richard Wallbank , Grace C. Wu , Luana Maroja , Laura Ferguson , Heather Hines , Camilo Salazar , Richard ffrench-Constant , Mathieu Joron , William Owen McMillan , Chris Jiggins
doi: http://dx.doi.org/10.1101/016006

A major challenge in evolutionary biology is to understand the origins of novel structures. The wing patterns of butterflies and moths are derived phenotypes unique to the Lepidoptera. Here we identify a gene that we name poikilomousa (poik), which regulates colour pattern switches in the mimetic Heliconius butterflies. Strong associations between phenotypic variation and DNA sequence variation are seen in three different Heliconius species, in addition to associations between gene expression and colour pattern. Colour pattern variants are also associated with differences in splicing of poik transcripts. poik is a member of the conserved fizzy family of cell cycle regulators. It belongs to a faster evolving subfamily, the closest functionally characterised orthologue being the cortex gene in Drosophila, a female germ-line specific protein involved in meiosis. poik appears to have adopted a novel function in the Lepidoptera and become a major target for natural selection acting on colour and pattern variation in this group.

utation rate estimation for 15 autosomal STR loci in a large population from Mainland China

Mutation rate estimation for 15 autosomal STR loci in a large population from Mainland China
Zhuo Zhao , Hua Wang , Jie Zhang , Zhi-Peng Liu , Ming Liu , Yuan Zhang , Li Sun , Hui Zhang
doi: http://dx.doi.org/10.1101/015875

STR, short trandem repeats, is well known as a type of powerful genetic marker and widely used in studying human population genetics. Compared with the conventional genetic markers, the mutation rate of STR is higher. Additionally, the mutations of STR loci do not lead to genetic inconsistencies between the genotypes of parents and children; therefore, the analysis of STR mutation is more suited to assess the population mutation. In this study, we focused on 15 autosomal STR loci (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, FGA). DNA samples from a total of 42416 unrelated healthy individuals (19037 trios) from the population of Mainland China collected between Jan 2012 and May 2014 were successfully investigated. In our study, the allele frequencies, paternal mutation rates, maternal mutation rates and average mutation rates were detected in the 15 STR loci. Furthermore, we also investigated the relationship between paternal ages, maternal ages, pregnant time, area and average mutation rate. We found that paternal mutation rate is higher than maternal mutation rate and the paternal, maternal, and average mutation rates have a positive correlation with paternal ages, maternal ages and times respectively. Additionally, the average mutation rates of coastal areas are higher than that of inland areas. Overall, these results suggest that the 15 autosomal STR loci can provide highly informative polymorphic data for population genetic assessment in Mainland China, as well as confirm and extend the application of STR analysis in population genetics.

Recent evolution in Rattus norvegicus is shaped by declining effective population size

Recent evolution in Rattus norvegicus is shaped by declining effective population size
Eva E Deinum , Daniel L Halligan , Rob W Ness , Yao-Hua Zhang , Lin Cong , Jian-Xu Zhang , Peter D Keightley
doi: http://dx.doi.org/10.1101/015818

The brown rat, Rattus norvegicus, is both a notorious pest and a frequently used model in biomedical research. By analysing genome sequences of 12 wild-caught brown rats from their ancestral range in NE China, along with the sequence of a black rat, R. rattus, we investigate the selective and demographic forces shaping variation in the genome. We estimate that the recent effective population size (N_e) of this species = 1.24 x 10^5, based on silent site diversity. We compare patterns of diversity in these genomes with patterns in multiple genome sequences of the house mouse Mus musculus castaneus), which has a much larger N_e. This reveals an important role for variation in the strength of genetic drift in mammalian genome evolution. By a Pairwise Sequentially Markovian Coalescent (PSMC) analysis of demographic history, we infer that there has been a recent population size bottleneck in wild rats, which we date to approximately 20,000 years ago. Consistent with this, wild rat populations have experienced an increased flux of mildly deleterious mutations, which segregate at higher frequencies in protein-coding genes and conserved noncoding elements (CNEs). This leads to negative estimates of the rate of adaptive evolution (alpha) in proteins and CNEs, a result which we discuss in relation to the strongly positive estimates observed in wild house mice. As a consequence of the population bottleneck, wild rats also show a markedly slower decay of linkage disequilibrium with physical distance than wild house mice.

Speciation in Heliconius Butterflies: Minimal Contact Followed by Millions of Generations of Hybridisation

Speciation in Heliconius Butterflies: Minimal Contact Followed by Millions of Generations of Hybridisation
Simon Henry Martin , Anders Eriksson , Krzysztof M. Kozak , Andrea Manica , Chris D. Jiggins
doi: http://dx.doi.org/10.1101/015800

Documenting the full extent of gene flow during speciation poses a challenge, as species ranges change over time and current rates of hybridisation might not reflect historical trends. Theoretical work has emphasized the potential for speciation in the face of ongoing hybridisation, and the genetic mechanisms that might facilitate this process. However, elucidating how the rate of gene flow between species may have changed over time has proved difficult. Here we use Approximate Bayesian Computation (ABC) to fit a model of speciation between the Neotropical butterflies Heliconius melpomene and Heliconius cydno. These species are ecologically divergent, rarely hybridize and display female hybrid sterility. Nevertheless, previous genomic studies suggests pervasive gene flow between them, extending deep into their past, and potentially throughout the speciation process. By modelling the rates of gene flow during early and later stages of speciation, we find that these species have been hybridising for hundreds of thousands of years, but have not done so continuously since their initial divergence. Instead, it appears that gene flow was rare or absent for as long as a million years in the early stages of speciation. Therefore, by dissecting the timing of gene flow between these species, we are able to reject a scenario of purely sympatric speciation in the face of continuous gene flow. We suggest that the period of minimal contact early in speciation may have allowed for the accumulation of genomic changes that later enabled these species to remain distinct despite a dramatic increase in the rate of hybridisation.

Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii

Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii

Rob W Ness , Andrew D Morgan , Radhakrishnan B Vasanthakrishnan , Nick Colegrave , Peter D Keightley
doi: http://dx.doi.org/10.1101/015693

Describing the process of spontaneous mutation is fundamental for understanding the genetic basis of disease, the threat posed by declining population size in conservation biology, and in much evolutionary biology. However, directly studying spontaneous mutation is difficult because of the rarity of de novo mutations. Mutation accumulation (MA) experiments overcome this by allowing mutations to build up over many generations in the near absence of natural selection. In this study, we sequenced the genomes of 85 MA lines derived from six genetically diverse wild strains of the green alga Chlamydomonas reinhardtii. We identified 6,843 spontaneous mutations, more than any other study of spontaneous mutation. We observed seven-fold variation in the mutation rate among strains and that mutator genotypes arose, increasing the mutation rate dramatically in some replicates. We also found evidence for fine-scale heterogeneity in the mutation rate, driven largely by the sequence flanking mutated sites, and by clusters of multiple mutations at closely linked sites. There was little evidence, however, for mutation rate heterogeneity between chromosomes or over large genomic regions of 200Kbp. Using logistic regression, we generated a predictive model of the mutability of sites based on their genomic properties, including local GC content, gene expression level and local sequence context. Our model accurately predicted the average mutation rate and natural levels of genetic diversity of sites across the genome. Notably, trinucleotides vary 17-fold in rate between the most mutable and least mutable sites. Our results uncover a rich heterogeneity in the process of spontaneous mutation both among individuals and across the genome.