utation rate estimation for 15 autosomal STR loci in a large population from Mainland China

Mutation rate estimation for 15 autosomal STR loci in a large population from Mainland China
Zhuo Zhao , Hua Wang , Jie Zhang , Zhi-Peng Liu , Ming Liu , Yuan Zhang , Li Sun , Hui Zhang
doi: http://dx.doi.org/10.1101/015875

STR, short trandem repeats, is well known as a type of powerful genetic marker and widely used in studying human population genetics. Compared with the conventional genetic markers, the mutation rate of STR is higher. Additionally, the mutations of STR loci do not lead to genetic inconsistencies between the genotypes of parents and children; therefore, the analysis of STR mutation is more suited to assess the population mutation. In this study, we focused on 15 autosomal STR loci (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, FGA). DNA samples from a total of 42416 unrelated healthy individuals (19037 trios) from the population of Mainland China collected between Jan 2012 and May 2014 were successfully investigated. In our study, the allele frequencies, paternal mutation rates, maternal mutation rates and average mutation rates were detected in the 15 STR loci. Furthermore, we also investigated the relationship between paternal ages, maternal ages, pregnant time, area and average mutation rate. We found that paternal mutation rate is higher than maternal mutation rate and the paternal, maternal, and average mutation rates have a positive correlation with paternal ages, maternal ages and times respectively. Additionally, the average mutation rates of coastal areas are higher than that of inland areas. Overall, these results suggest that the 15 autosomal STR loci can provide highly informative polymorphic data for population genetic assessment in Mainland China, as well as confirm and extend the application of STR analysis in population genetics.

Recent evolution in Rattus norvegicus is shaped by declining effective population size

Recent evolution in Rattus norvegicus is shaped by declining effective population size
Eva E Deinum , Daniel L Halligan , Rob W Ness , Yao-Hua Zhang , Lin Cong , Jian-Xu Zhang , Peter D Keightley
doi: http://dx.doi.org/10.1101/015818

The brown rat, Rattus norvegicus, is both a notorious pest and a frequently used model in biomedical research. By analysing genome sequences of 12 wild-caught brown rats from their ancestral range in NE China, along with the sequence of a black rat, R. rattus, we investigate the selective and demographic forces shaping variation in the genome. We estimate that the recent effective population size (N_e) of this species = 1.24 x 10^5, based on silent site diversity. We compare patterns of diversity in these genomes with patterns in multiple genome sequences of the house mouse Mus musculus castaneus), which has a much larger N_e. This reveals an important role for variation in the strength of genetic drift in mammalian genome evolution. By a Pairwise Sequentially Markovian Coalescent (PSMC) analysis of demographic history, we infer that there has been a recent population size bottleneck in wild rats, which we date to approximately 20,000 years ago. Consistent with this, wild rat populations have experienced an increased flux of mildly deleterious mutations, which segregate at higher frequencies in protein-coding genes and conserved noncoding elements (CNEs). This leads to negative estimates of the rate of adaptive evolution (alpha) in proteins and CNEs, a result which we discuss in relation to the strongly positive estimates observed in wild house mice. As a consequence of the population bottleneck, wild rats also show a markedly slower decay of linkage disequilibrium with physical distance than wild house mice.

Speciation in Heliconius Butterflies: Minimal Contact Followed by Millions of Generations of Hybridisation

Speciation in Heliconius Butterflies: Minimal Contact Followed by Millions of Generations of Hybridisation
Simon Henry Martin , Anders Eriksson , Krzysztof M. Kozak , Andrea Manica , Chris D. Jiggins
doi: http://dx.doi.org/10.1101/015800

Documenting the full extent of gene flow during speciation poses a challenge, as species ranges change over time and current rates of hybridisation might not reflect historical trends. Theoretical work has emphasized the potential for speciation in the face of ongoing hybridisation, and the genetic mechanisms that might facilitate this process. However, elucidating how the rate of gene flow between species may have changed over time has proved difficult. Here we use Approximate Bayesian Computation (ABC) to fit a model of speciation between the Neotropical butterflies Heliconius melpomene and Heliconius cydno. These species are ecologically divergent, rarely hybridize and display female hybrid sterility. Nevertheless, previous genomic studies suggests pervasive gene flow between them, extending deep into their past, and potentially throughout the speciation process. By modelling the rates of gene flow during early and later stages of speciation, we find that these species have been hybridising for hundreds of thousands of years, but have not done so continuously since their initial divergence. Instead, it appears that gene flow was rare or absent for as long as a million years in the early stages of speciation. Therefore, by dissecting the timing of gene flow between these species, we are able to reject a scenario of purely sympatric speciation in the face of continuous gene flow. We suggest that the period of minimal contact early in speciation may have allowed for the accumulation of genomic changes that later enabled these species to remain distinct despite a dramatic increase in the rate of hybridisation.

Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii

Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii

Rob W Ness , Andrew D Morgan , Radhakrishnan B Vasanthakrishnan , Nick Colegrave , Peter D Keightley
doi: http://dx.doi.org/10.1101/015693

Describing the process of spontaneous mutation is fundamental for understanding the genetic basis of disease, the threat posed by declining population size in conservation biology, and in much evolutionary biology. However, directly studying spontaneous mutation is difficult because of the rarity of de novo mutations. Mutation accumulation (MA) experiments overcome this by allowing mutations to build up over many generations in the near absence of natural selection. In this study, we sequenced the genomes of 85 MA lines derived from six genetically diverse wild strains of the green alga Chlamydomonas reinhardtii. We identified 6,843 spontaneous mutations, more than any other study of spontaneous mutation. We observed seven-fold variation in the mutation rate among strains and that mutator genotypes arose, increasing the mutation rate dramatically in some replicates. We also found evidence for fine-scale heterogeneity in the mutation rate, driven largely by the sequence flanking mutated sites, and by clusters of multiple mutations at closely linked sites. There was little evidence, however, for mutation rate heterogeneity between chromosomes or over large genomic regions of 200Kbp. Using logistic regression, we generated a predictive model of the mutability of sites based on their genomic properties, including local GC content, gene expression level and local sequence context. Our model accurately predicted the average mutation rate and natural levels of genetic diversity of sites across the genome. Notably, trinucleotides vary 17-fold in rate between the most mutable and least mutable sites. Our results uncover a rich heterogeneity in the process of spontaneous mutation both among individuals and across the genome.

Quality assessment for different haplotyping methods and GWAS sensitivity to phasing errors

Quality assessment for different haplotyping methods and GWAS sensitivity to phasing errors

Giovanni Busonera , Marco Cogoni , Gianluigi Zanetti
doi: http://dx.doi.org/10.1101/015669

In this report we present a multimarker association tool (Flash) based on a novel algorithm to generate haplotypes from raw genotype data. It belongs to the entropy minimization class of methods and is composed of a two stage deterministic – heuristic part and of a optional stochastic optimization. This algorithm is able to scale up well to handle huge datasets with faster performance than the competing technologies such as BEAGLE and MACH while maintaining a comparable accuracy. A quality assessment of the results is carried out by comparing the switch error. Finally, the haplotypes are used to perform a haplotype-based Genome-wide Association Study (GWAS). The association results are compared with a multimarker and a single SNP association test performed with Plink. Our experiments confirm that the multimarker association test can be more powerful than the single SNP one as stated in the literature. Moreover, Flash and Plink show similar results for the multimarker association test but Flash speeds up the computation time of about an order of magnitude using 5 SNP size haplotypes.

Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

Mark Lipson , Po-Ru Loh , Sriram Sankararaman , Nick Patterson , Bonnie Berger , David Reich
doi: http://dx.doi.org/10.1101/015560

The human mutation rate is an essential parameter for studying the evolution of our species, interpreting present-day genetic variation, and understanding the incidence of genetic disease. Nevertheless, our current estimates of the rate are uncertain. Classical methods based on sequence divergence have yielded significantly larger values than more recent approaches based on counting de novo mutations in family pedigrees. Here, we propose a new method that uses the fine-scale human recombination map to calibrate the rate of accumulation of mutations. By comparing local heterozygosity levels in diploid genomes to the genetic distance scale over which these levels change, we are able to estimate a long-term mutation rate averaged over hundreds or thousands of generations. We infer a rate of 1.65 +/- 0.10 x 10^(-8) mutations per base per generation, which falls in between phylogenetic and pedigree-based estimates, and we suggest possible mechanisms to reconcile our estimate with previous studies. Our results support intermediate-age divergences among human populations and between humans and other great apes.

Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations

Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations
Marc Haber , Massimo Mezzavilla , Yali Xue , David Comas , Paolo Gasparini , Pierre Zalloua , Chris Tyler-Smith
doi: http://dx.doi.org/10.1101/015396

The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain underrepresented in genetic studies and have a complex history including a major geographic displacement during World War One. Here, we analyse genome-wide variation in 173 Armenians and compare them to 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3,000 and ~2,000 BCE, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1,200 BCE when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of the Armenian ancestry may originate from an ancestral population best represented by Neolithic Europeans.