On the importance of being structured: instantaneous coalescence rates and a re-evaluation of human evolution

On the importance of being structured: instantaneous coalescence rates and a re-evaluation of human evolution
Olivier Mazet, Willy Rodríguez, Simona Grusea, Simon Boitard, Lounès Chikhi

Most species are structured and influenced by processes that either increased or reduced gene flow between populations. However, most population genetic inference methods ignore population structure and reconstruct a history characterized by population size changes under the assumption that species behave as panmictic units. This is potentially problematic since population structure can generate spurious signals of population size change. Moreover, when the model assumed for demographic inference is misspecified, genomic data will likely increase the precision of misleading if not meaningless parameters. In a context of model uncertainty (panmixia \textit{versus} structure) genomic data may thus not necessarily lead to improved statistical inference.
We consider two haploid genomes and develop a theory which explains why any demographic model (with or without population size changes) will necessarily be interpreted as a series of changes in population size by inference methods ignoring structure. We introduce a new parameter, the IICR (inverse instantaneous coalescence rate), and show that it is equivalent to a population size only in panmictic models, and mostly misleading for structured models. We argue that this general issue affects all population genetics methods ignoring population structure. We take the PSMC method as an example and show that it infers population size changes that never took place. We apply our approach to human genomic data and find a reduction in gene flow at the start of the Pleistocene, a major increase throughout the Middle-Pleistocene, and an abrupt disconnection preceding the emergence of modern humans.

Negative selection maintains transcription factor binding motifs in human cancer

Negative selection maintains transcription factor binding motifs in human cancer
I. E. Vorontsov, I. V. Kulakovskiy, G. Khimulya, E. N. Lukianova, D. D. Nikolaeva, I. A. Eliseeva, V. J. Makeev

Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer, mutations in binding sites of selected transcription factors have been found under positive selection. However, negative selection of mutations in coding regions is elusive and significance of negative selection in non-coding regions remains controversial.
Here we present analysis of transcription factors with binding sites co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of binding motifs. Such conservation of motifs is even more exhibited in DNase accessible regions.
Our data demonstrate negative selection against binding sites alterations and suggest that this selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors and the respective conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers.

Evolutionary history of the global emergence of the Escherichia coli epidemic clone ST131

Evolutionary history of the global emergence of the Escherichia coli epidemic clone ST131

Nicole Stoesser, Anna Sheppard, Louise Pankhurst, Nicola de Maio, Catrin E Moore, Robert Sebra, Paul Turner, Luke W Anson, Andrew Kasarskis, Elizabeth M Batty, Veronica Kos, Daniel J Wilson, Rattanaphone Phetsouvanh, David Wyllie, Evgeni Sokurenko, Amee R Manges, Timothy J Johnson, Lance B Price, Timothy E. A. Peto, James R Johnson, Xavier Didelot, Ann Sarah Walker, Derrick W Crook, Modernising Medical Microbiology Informatics Group

Strongly asymmetric hybridization barriers shape the origin of a new polyploid species and its hybrid ancestor

Strongly asymmetric hybridization barriers shape the origin of a new polyploid species and its hybrid ancestor

Mario Vallejo-Marin, Arielle Cooley, Michelle Qi, Madison Folmer, Michael McKain, Joshua Puzey

Climate and developmental plasticity: interannual variability in grapevine leaf morphology

Climate and developmental plasticity: interannual variability in grapevine leaf morphology

Daniel H Chitwood, Susan M Rundell, Darren Y Li, Quaneisha L Woodford, Tommy T Yu, Jose R Lopez, Danny Greenblatt, Julie Kang, Jason P Londo

An Approximate Markov Model for the Wright-Fisher Diffusion

An Approximate Markov Model for the Wright-Fisher Diffusion

Anna Ferrer-Admetlla, Christoph Leuenberger, Jeffrey D Jensen, Daniel Wegmann

A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle

A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle

Hubert Pausch, Reiner Emmerling, Hermann Schwarzenbacher, Ruedi Fries

Haplotag: software for haplotype-based genotyping-by-sequencing analysis

Haplotag: software for haplotype-based genotyping-by-sequencing analysis

Nicholas A Tinker, Wubishet A Bekele, Jiro Hattori

Statistically-Consistent k-mer Methods for Phylogenetic Tree Reconstruction

Statistically-Consistent k-mer Methods for Phylogenetic Tree Reconstruction
Elizabeth S. Allman, John A. Rhodes, Seth Sullivant

Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared-Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing the corrected distance out-performs many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well, since k-mer methods are usually the first step in constructing a guide tree for such algorithms.

The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

Sandeep J Joseph, Ben Li, Robert A Petit, Zhaohui Qin, Lyndsey Darrow, Timothy D Read