Agriculture driving male expansion in Neolithic Time
Chuan-Chao Wang, Yunzhi Huang, Shao-Qing Wen, Chun Chen, Li Jin, Hui Li
(Submitted on 27 Nov 2013)
The emergence of agriculture is suggested to have driven extensive human population growths. However, genetic evidence from maternal mitochondrial genomes suggests major population expansions began before the emergence of agriculture. Therefore, role of agriculture that played in initial population expansions still remains controversial. Here, we analyzed a set of globally distributed whole Y chromosome and mitochondrial genomes of 526 male samples from 1000 Genome Project. We found that most major paternal lineage expansions coalesced in Neolithic Time. The estimated effective population sizes through time revealed strong evidence for 10- to 100-fold increase in population growth of males with the advent of agriculture. This sex-biased Neolithic expansion might result from the reduction in hunting-related mortality of males.
Population genetics and substitution models of adaptive evolution
Mario dos Reis
(Submitted on 26 Nov 2013)
The ratio of non-synonymous to synonymous substitutions ω(=dN/dS) has been widely used as a measure of adaptive evolution in protein coding genes. Omega can be defined in terms of population genetics parameters as the fixation ratio of selected vs. neutral mutants. Here it is argued that approaches based on the infinite sites model are not appropriate to define ω for single codon locations. Simple models of amino acid substitution with reversible mutation and selection are analysed, and used to define ω under several evolutionary scenarios. In most practical cases ω1 can be sometimes expected for single locations at equilibrium. An example with influenza data is discussed.
The effect of linkage on establishment and survival of locally beneficial mutations
Simon Aeschbacher, Reinhard Buerger
(Submitted on 25 Nov 2013)
When organisms adapt to spatially heterogeneous environments, selection may drive divergence at multiple genes. If populations under divergent selection also exchange migrants, we expect genetic differentiation to be high at selected loci, relative to the baseline caused by migration and genetic drift. Indeed, empirical studies have found peaks of putatively adaptive differentiation. These are highly variable in length, some of them extending over several hundreds of thousands of base pairs. How can such `islands of differentiation’ be explained? Physical linkage produces elevated levels of differentiation at loci close to genes under selection. However, whether this is enough to account for the observed patterns of divergence is not well understood. Here, we investigate the fate of a locally beneficial mutation that arises in linkage to an existing migration-selection polymorphism and derive two important quantities: the probability that the mutation becomes established, and the expected time to its extinction. We find that intermediate levels of recombinations are sometimes favourable, and that physical linkage can lead to strongly elevated invasion probabilities and extinction times. We provide a rule of thumb for when this is the case. Moreover, we quantify the long-term effect of polygenic local adaptation on linked neutral variation.
Interspecific Introgressive Origin of Genomic Diversity in the House Mouse
Kevin J. Liu, Ying Song, Michael H. Kohn, Luay Nakhleh
(Submitted on 22 Nov 2013)
We report on a genome-wide scan for introgression in a eukaryote. The scan identified kilobase-to-megabase-long regions of introgressive origin involving Mus spretus in six Mus musculus domesticus chromosomes, based on genomes sampled from and near the European range of sympatry. Our analyses point to the introgression of both adaptive driver and linked passenger loci. Introgression could transfer traits, such as the discovered warfarin resistance in European M. m. domesticus, and could create new traits, as we infer using a functional network analysis. Our study sheds new light on the extent of adaptive introgession and calls for new analyses of eukaryotic genomes that explicitly account for the possibility of introgression.
Generation of high-resolution a priori Y-chromosome phylogenies using “next-generation” sequencing data
Gregory R Magoon, Raymond H Banks, Christian Rottensteiner, Bonnie E Schrack, Vincent O Tilroe, Andrew J Grierson
An approach for generating high-resolution a priori maximum parsimony Y-chromosome (chrY) phylogenies based on SNP and small INDEL variant data from massively-parallel short-read (next-generation) sequencing data is described; the tree-generation methodology produces annotations localizing mutations to individual branches of the tree, along with indications of mutation placement uncertainty in cases for which “no-calls” (through lack of mapped reads or otherwise) at particular site precludes a precise placement of the mutation. The approach leverages careful variant site filtering and a novel iterative reweighting procedure to generate high-accuracy trees while considering variants in regions of chrY that had previously been excluded from analyses based on short-read sequencing data. It is argued that the proposed approach is also superior to previous region-based filtering approaches in that it adapts to the quality of the underlying data and will automatically allow the scope of sites considered to expand as the underlying data quality improves (e.g. through longer read lengths). Key related issues, including calling of genotypes for the hemizygous chrY, reliability of variant results, read mismappings and “heterozygous” genotype calls, and the mutational stability of different variants are discussed and taken into account. The methodology is demonstrated through application to a dataset consisting of 1292 male samples from diverse populations and haplogroups, with the majority coming from low-coverage sequencing by the 1000 Genomes Project. Application of the tree-generation approach to these data produces a tree involving over 120,000 chrY variant sites (about 45,000 sites if singletons are excluded). The utility of this approach in refining the Y-chromosome phylogenetic tree is demonstrated by examining results for several haplogroups. The results indicate a number of new branches on the Y-chromosome phylogenetic tree, many of them subdividing known branches, but also including some that inform the presence of additional levels along the trunk of the tree. Finally, opportunities for extensions of this phylogenetic analysis approach to other types of genetic data are examined.
Computational inference beyond Kingman’s coalescent
Jere Koskela, Paul Jenkins, Dario Spano
(Submitted on 22 Nov 2013)
Full likelihood inference under Kingman’s coalescent is a computationally challenging problem to which importance sampling (IS) and the product of approximate conditionals (PAC) method have been applied successfully. Both methods can be expressed in terms of families of intractable conditional sampling distributions (CSDs), and rely on principled approximations for accurate inference. Recently, more general Λ- and Ξ-coalescents have been observed to provide better modelling fits to some genetic data sets. We derive families of approximate CSDs for finite sites Λ- and Ξ-coalescents, and use them to obtain “approximately optimal” IS and PAC algorithms for Λ-coalescents, yielding substantial gains in efficiency over existing methods.
Comment on “TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions” by Kim et al.
Alexander Dobin, Thomas R Gingeras
In the recent paper by Kim et al. (Genome biology, 2013. 14(4): p. R36) the accuracy of TopHat2 was compared to other RNA-seq aligners. In this comment we re-examine most important analyses from this paper and identify several deficiencies that significantly diminished performance of some of the aligners, including incorrect choice of mapping parameters, unfair comparison metrics, and unrealistic simulated data. Using STAR (Dobin et al., Bioinformatics, 2013. 29(1): p. 15-21) as an exemplar, we demonstrate that correcting these deficiencies makes its accuracy equal or better than that of TopHat2. Furthermore, this exercise highlighted some serious issues with the TopHat2 algorithms, such as poor recall of alignments with a moderate (>3) number of mismatches, low sensitivity and high false discovery rate for splice junction detection, loss of precision for the realignment algorithm, and large number of false chimeric alignments.