On the importance of being structured: instantaneous coalescence rates and a re-evaluation of human evolution

Posted on November 10, 2015 by schraib

On the importance of being structured: instantaneous coalescence rates and a re-evaluation of human evolution
Olivier Mazet, Willy Rodríguez, Simona Grusea, Simon Boitard, Lounès Chikhi

Most species are structured and influenced by processes that either increased or reduced gene flow between populations. However, most population genetic inference methods ignore population structure and reconstruct a history characterized by population size changes under the assumption that species behave as panmictic units. This is potentially problematic since population structure can generate spurious signals of population size change. Moreover, when the model assumed for demographic inference is misspecified, genomic data will likely increase the precision of misleading if not meaningless parameters. In a context of model uncertainty (panmixia \textit{versus} structure) genomic data may thus not necessarily lead to improved statistical inference.
We consider two haploid genomes and develop a theory which explains why any demographic model (with or without population size changes) will necessarily be interpreted as a series of changes in population size by inference methods ignoring structure. We introduce a new parameter, the IICR (inverse instantaneous coalescence rate), and show that it is equivalent to a population size only in panmictic models, and mostly misleading for structured models. We argue that this general issue affects all population genetics methods ignoring population structure. We take the PSMC method as an example and show that it infers population size changes that never took place. We apply our approach to human genomic data and find a reduction in gene flow at the start of the Pleistocene, a major increase throughout the Middle-Pleistocene, and an abrupt disconnection preceding the emergence of modern humans.

Negative selection maintains transcription factor binding motifs in human cancer

Posted on November 10, 2015 by schraib

Negative selection maintains transcription factor binding motifs in human cancer
I. E. Vorontsov, I. V. Kulakovskiy, G. Khimulya, E. N. Lukianova, D. D. Nikolaeva, I. A. Eliseeva, V. J. Makeev

Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer, mutations in binding sites of selected transcription factors have been found under positive selection. However, negative selection of mutations in coding regions is elusive and significance of negative selection in non-coding regions remains controversial.
Here we present analysis of transcription factors with binding sites co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of binding motifs. Such conservation of motifs is even more exhibited in DNase accessible regions.
Our data demonstrate negative selection against binding sites alterations and suggest that this selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors and the respective conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers.

Evolutionary history of the global emergence of the Escherichia coli epidemic clone ST131

Posted on November 9, 2015 by schraib

Evolutionary history of the global emergence of the Escherichia coli epidemic clone ST131

Nicole Stoesser, Anna Sheppard, Louise Pankhurst, Nicola de Maio, Catrin E Moore, Robert Sebra, Paul Turner, Luke W Anson, Andrew Kasarskis, Elizabeth M Batty, Veronica Kos, Daniel J Wilson, Rattanaphone Phetsouvanh, David Wyllie, Evgeni Sokurenko, Amee R Manges, Timothy J Johnson, Lance B Price, Timothy E. A. Peto, James R Johnson, Xavier Didelot, Ann Sarah Walker, Derrick W Crook, Modernising Medical Microbiology Informatics Group

bioRxiv doi: http://dx.doi.org/10.1101/030668

ABSTRACT Background Escherichia coli sequence type 131 (ST131) has emerged globally as the most predominant lineage within this clinically important species, and its association with fluoroquinolone and extended-spectrum cephalosporin resistance impacts significantly on treatment. The evolutionary histories of this lineage, and of important antimicrobial resistance elements within it, remain unclearly defined. Results This study of the largest worldwide collection (n = 215) of sequenced ST131 E. coli isolates to date demonstrates that clonal expansion of two previously recognized antimicrobial-resistant clades, C1/H30R and C2/H30Rx, started around 25 years ago, consistent with the widespread introduction of fluoroquinolones and extended-spectrum cephalosporins in clinical medicine. These two clades appear to have emerged in the United States, with the expansion of the C2/H30Rx clade driven by the acquisition of a blaCTX-M-15-containing IncFII-like plasmid that has subsequently undergone extensive rearrangement. Several other evolutionary processes influencing the trajectory of this drug-resistant lineage are described, including sporadic acquisitions of CTX-M resistance plasmids, and chromosomal integration of blaCTX-M within sub-clusters followed by vertical evolution. These processes are also occurring for another family of CTX-M gene variants more recently observed amongst ST131, the blaCTX-M-14/14-like group. Conclusions The complexity of the evolutionary history of ST131 has important implications for antimicrobial resistance surveillance, epidemiological analysis, and control of emerging clinical lineages of E. coli. These data also highlight the global imperative to reduce specific antibiotic selection pressures, and demonstrate the important and varied roles played by plasmids and other mobile genetic elements in the perpetuation of antimicrobial resistance within lineages.

Strongly asymmetric hybridization barriers shape the origin of a new polyploid species and its hybrid ancestor

Posted on November 9, 2015 by schraib

Strongly asymmetric hybridization barriers shape the origin of a new polyploid species and its hybrid ancestor

Mario Vallejo-Marin, Arielle Cooley, Michelle Qi, Madison Folmer, Michael McKain, Joshua Puzey

bioRxiv doi: http://dx.doi.org/10.1101/030932

Premise of the study: Hybridization between diploids and tetraploids can lead to new allopolyploid species, often via a triploid intermediate. Viable triploids are often produced asymmetrically, with greater success observed for maternal-excess crosses where the mother has a higher ploidy than the father. Here we investigate the evolutionary origins of Mimulus peregrinus, an allopolyploid recently derived from the triploid M. x robertsii, to determine whether reproductive asymmetry has shaped the formation of this new species. Methods: We used reciprocal crosses between the diploid (M. guttatus) and tetraploid (M. luteus) progenitors to determine the viability of triploid hybrids resulting from paternal- versus maternal-excess crosses. To investigate whether experimental results predict patterns seen in the field, we performed parentage analyses comparing natural populations of M. peregrinus to its diploid, tetraploid, and triploid progenitors. Organellar sequences obtained from pre-existing genomic data, supplemented with additional genotyping was used to establish the maternal ancestry of multiple M. peregrinus and M. x robertsii populations. Key results: We find strong evidence for asymmetric origins of M. peregrinus, but opposite to the common pattern, with paternal-excess crosses significantly more successful than maternal-excess crosses. These results successfully predicted hybrid formation in nature: 111 of 114 M. x robertsii individuals, and 27 of 27 M. peregrinus, had an M. guttatus maternal haplotype. Conclusion: This study, which includes assembly of the first Mimulus chloroplast genome, demonstrates the utility of parentage analysis through genome skimming. We highlight the benefits of complementing genomic analyses with experimental approaches to understand asymmetry in allopolyploid speciation.

Climate and developmental plasticity: interannual variability in grapevine leaf morphology

Posted on November 9, 2015 by schraib

Climate and developmental plasticity: interannual variability in grapevine leaf morphology

Daniel H Chitwood, Susan M Rundell, Darren Y Li, Quaneisha L Woodford, Tommy T Yu, Jose R Lopez, Danny Greenblatt, Julie Kang, Jason P Londo

bioRxiv doi: http://dx.doi.org/10.1101/030957

The shape of leaves are dynamic, changing over evolutionary time between species, within a single plant producing different shaped leaves at successive nodes, during the development of a single leaf as it allometrically expands, and in response to the environment. Notably, strong correlations between the dissection and size of leaves with temperature and precipitation exist in both the paleorecord and extant populations. Yet, a morphometric model integrating evolutionary, developmental, and environmental effects on leaf shape is lacking. Here, we continue a morphometric analysis of >5,500 leaves representing 270 grapevines of multiple Vitis species between two growing seasons. Leaves are paired one-to-one, vine-to-vine accounting for developmental context, between growing seasons. Linear Discriminant Analysis reveals shape features that specifically define growing season, regardless of species or developmental context. The shape feature, a more pronounced distal sinus, is associated with the colder, drier growing season, consistent with patterns observed in the paleorecord. We discuss the implications of such plasticity in a long-lived woody perennial, such as grapevine, with respect to the evolution and functionality of plant morphology and changes in climate.

An Approximate Markov Model for the Wright-Fisher Diffusion

Posted on November 9, 2015 by schraib

An Approximate Markov Model for the Wright-Fisher Diffusion

Anna Ferrer-Admetlla, Christoph Leuenberger, Jeffrey D Jensen, Daniel Wegmann

bioRxiv doi: http://dx.doi.org/10.1101/030940

The joint and accurate inference of selection and demography from genetic data is considered a particularly challenging question in population genetics, since both process may lead to very similar patterns of genetic diversity. However, additional information for disentangling these effects may be obtained by observing changes in allele frequencies over multiple time points. Such data is common in experimental evolution studies, as well as in the comparison of ancient and contemporary samples. Leveraging this information, however, has been computationally challenging, particularly when considering multi-locus data sets. To overcome these issues, we introduce a novel, discrete approximation for diffusion processes, termed \textit{mean transition time approximation}, which preserves the long-term behavior of the underlying continuous diffusion process. We then derive this approximation for the particular case of inferring selection and demography from time series data under the classic Wright-Fisher model and demonstrate that our approximation is well suited to describe allele trajectories through time, even when only a few states are used. We then develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy, and further extend this model to also infer the rates of sequencing errors and mutations. We finally apply our approach to recent experimental data on the evolution of drug resistance in Influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported.

A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle

Posted on November 9, 2015 by schraib

A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle

Hubert Pausch, Reiner Emmerling, Hermann Schwarzenbacher, Ruedi Fries

bioRxiv doi: http://dx.doi.org/10.1101/030981

Background: The availability of whole-genome sequence data from key ancestors provides an exhaustive catalogue of polymorphic sites segregating within and across cattle breeds. Sequence variants from key ancestors can be imputed in animals that have been genotyped using medium- and high-density genotyping arrays. Association analysis with imputed sequences, particularly if applied to multiple traits simultaneously, is a very powerful approach to revealing candidate causal variants underlying complex phenotypes. Results: We used whole-genome sequence data from 157 key ancestors of the German Fleckvieh population to impute 20 561 798 sequence variants in 10 363 animals that had (partly imputed) array-derived genotypes at 634 109 SNP. The imputed sequence data were enriched for rare variants. Association studies with imputed sequence variants were performed using seven correlated udder conformation traits as response variables. The calculation of an approximate multi-trait test statistic enabled us to detect twelve major QTL (P<2.97 x 10-9) controlling different aspects of mammary gland morphology. Imputed sequence variants were the most significantly associated at eleven QTL, whereas the top association signal at a QTL on BTA14 resulted from an array-derived variant. Seven QTL were associated with multiple phenotypes. Most QTL were located in non-coding regions of the genome in close neighborhood, however, to plausible candidate genes for mammary gland morphology (SP5, GC, NPFFR2, CRIM1, RXFP2, TBX5, RBM19, ADAM12). Conclusions: Association analysis with imputed sequence variants allows QTL characterization at maximum resolution. Multi-trait approaches can reveal QTL that are not detected in single-trait association studies. Most QTL for udder conformation traits were located in non-coding elements of the genome suggesting regulatory mutations to be the major determinants of variation in mammary gland morphology in cattle.

Haplotag: software for haplotype-based genotyping-by-sequencing analysis

Posted on November 9, 2015 by schraib

Haplotag: software for haplotype-based genotyping-by-sequencing analysis

Nicholas A Tinker, Wubishet A Bekele, Jiro Hattori

bioRxiv doi: http://dx.doi.org/10.1101/031013

Genotyping-by-sequencing (GBS) and related methods are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of SNPs within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analysing large data sets, applications of GBS may require substantial time, expertise and computational resources. Haplotag, the novel GBS software described here, is freely available and operates with minimal user-investment on widely-available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1) operates without a reference genome; (2) can be used in a polyploid species; (3) provides a discovery mode and a production mode; (4) discovers polymorphisms based on a model of local haplotypes within sequenced tags; (5) reports SNPs as well as haplotype-based genotypes; (6) provides an intuitive visual ???passport??? for each inferred locus.

Statistically-Consistent k-mer Methods for Phylogenetic Tree Reconstruction

Posted on November 9, 2015 by schraib

Statistically-Consistent k-mer Methods for Phylogenetic Tree Reconstruction
Elizabeth S. Allman, John A. Rhodes, Seth Sullivant

Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared-Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing the corrected distance out-performs many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well, since k-mer methods are usually the first step in constructing a guide tree for such algorithms.

The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

Posted on November 6, 2015 by schraib

The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

Sandeep J Joseph, Ben Li, Robert A Petit, Zhaohui Qin, Lyndsey Darrow, Timothy D Read

bioRxiv doi: http://dx.doi.org/10.1101/030692

Metagenome shotgun sequence data offer the potential for large scale biogeographic analysis of microbial species. In this project we developed a method for detecting 33 common subtypes of the pathogenic bacterium Staphylococcus aureus. We used a binomial mixture model implemented in the binstrain software and the coverage counts at > 100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of each subtype in metagenome samples. Using this pipeline we were able to obtain > 87% sensitivity and > 94% specificity when testing on low genome coverage samples of diverse S. aureus strains (0.025X). We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage > 0.025. In both projects, CC8 and CC30 were the most common S. aureus subtypes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are risk factors for S. aureus but there was limited power to find factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city. We argue that pathogen detection in metagenome samples requires the use of subtypes based on whole species population genomic analysis rather than using ad hoc collections of reference strains.

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Author Archives: schraib

On the importance of being structured: instantaneous coalescence rates and a re-evaluation of human evolution

Negative selection maintains transcription factor binding motifs in human cancer

Evolutionary history of the global emergence of the Escherichia coli epidemic clone ST131

Strongly asymmetric hybridization barriers shape the origin of a new polyploid species and its hybrid ancestor

Climate and developmental plasticity: interannual variability in grapevine leaf morphology

An Approximate Markov Model for the Wright-Fisher Diffusion

A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle

Haplotag: software for haplotype-based genotyping-by-sequencing analysis

Statistically-Consistent k-mer Methods for Phylogenetic Tree Reconstruction

The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: