Happy New Year Homo erectus? More evidence for interbreeding with archaics predating the modern human/Neanderthal split

Happy New Year Homo erectus? More evidence for interbreeding with archaics predating the modern human/Neanderthal split
Peter J. Waddell
(Submitted on 30 Dec 2013)

A range of a priori hypotheses about the evolution of modern and archaic genomes are further evaluated and tested. In addition to the well-known splits/introgressions involving Neanderthal genes into out-of- Africa people, or Denisovan genes into Oceanians, a further series of archaic splits and hypotheses proposed in Waddell et al. (2011) are considered in detail. These include signals of Denisovans with something markedly more archaic and possibly something more archaic into Papuans as well. These are compared and contrasted with some well-advertised introgressions such as Denisovan genes across East Asia, archaic genes into San or non-tree mixing between Oceanians, East Asians and Europeans. The general result is that these less appreciated and surprising archaic splits have just as much or more support in genome sequence data. Further, evaluation confirms the hypothesis that archaic genes are much rarer on modern X chromosomes, and may even be near totally absent, suggesting strong selection against their introgression. Modeling of relative split weights allows an inference of the proportion of the genome the Denisovan seems to have gotten from an older archaic, and the best estimate is around 2%. Using a mix of quantitative and qualitative morphological data and novel phylogenetic methods, robust support is found for multiple distinct middle Pleistocene lineages. Of these, fossil hominids such as SH5, Petralona, and Dali, in particular, look like prime candidates for contributing pre-Neanderthal/Modern archaic genes to Denisovans, while the Jinniu-Shan fossil looks like the best candidate for a close relative of the Denisovan. That the Papuans might have received some truly archaic genes appears a good possibility and they might even be from Homo erectus.

Sequence Capture Versus Restriction Site Associated DNA Sequencing for Phylogeography

Sequence Capture Versus Restriction Site Associated DNA Sequencing for Phylogeography
Michael G. Harvey, Brian Tilston Smith, Travis C. Glenn, Brant C. Faircloth, Robb T. Brumfield
(Submitted on 22 Dec 2013)

Genomic datasets generated with massively parallel sequencing methods have the potential to propel systematics in new and exciting directions, but selecting appropriate markers and methods is not straightforward. We applied two approaches with particular promise for systematics, restriction site associated DNA sequencing (RAD-Seq) and sequence capture (Seq-cap) of ultraconserved elements (UCEs), to the same set of samples from a non-model, Neotropical bird. We found that both RAD-Seq and Seq-cap produced genomic datasets containing thousands of loci and SNPs and that the inferred population assignments and species trees were concordant between datasets. However, model-based estimates of demographic parameters differed between datasets, particularly when we estimated the parameters using a method based on allele frequency spectra. The differences we observed may result from differences in assembly, alignment, and filtering of sequence data between methods, and our findings suggest that caution is warranted when using allele frequencies to estimate parameters from low-coverage sequencing data. We further explored the differences between methods using simulated Seq-cap- and RAD-Seq-like datasets. Analyses of simulated data suggest that increasing the number of loci from 500 to 5000 increased phylogenetic concordance factors and the accuracy and precision of demographic parameter estimates, but increasing the number of loci past 5000 resulted in minimal gains. Increasing locus length from 64 bp to 500 bp improved phylogenetic concordance factors and minimal gains were observed with loci longer than 500 bp, but locus length did not influence the accuracy and precision of demographic parameter estimates. We discuss our results relative to the diversity of data collection methods available, and we provide advice for harnessing next-generation sequencing for systematics research.

Ancient human genomes suggest three ancestral populations for present-day Europeans

Ancient human genomes suggest three ancestral populations for present-day Europeans
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, Gabriel Renaud, Swapan Mallick, Peter H. Sudmant, Joshua G. Schraiber, Sergi Castellano, Karola Kirsanow, Christos Economou, Ruth Bollongino, Qiaomei Fu, Kirsten Bos, Susanne Nordenfelt, Cesare de Filippo, Kay Prüfer, Susanna Sawyer, Cosimo Posth, Wolfgang Haak, Fredrik Hallgren, Elin Fornander, George Ayodo, Hamza A. Babiker, Elena Balanovska, Oleg Balanovsky, Haim Ben-Ami, Judit Bene, Fouad Berrada, Francesca Brisighelli, George B.J. Busby, Francesco Cali, Mikhail Churnosov, David E.C. Cole, Larissa Damba, Dominique Delsate, George van Driem, Stanislav Dryomov, Sardana A. Fedorova, Michael Francken, Irene Gallego Romero, Marina Gubina, Jean-Michel Guinet, Michael Hammer, Brenna Henn, Tor Helvig, Ugur Hodoglugil, Aashish R. Jha, Rick Kittles, Elza Khusnutdinova, Toomas Kivisild, Vaidutis Kučinskas, Rita Khusainova, Alena Kushniarevich, Leila Laredj, Sergey Litvinov, Robert W. Mahley, Béla Melegh, Ene Metspalu, Joanna Mountain, Thomas Nyambo, Ludmila Osipova, Jüri Parik, Fedor Platonov, Olga L. Posukh, Valentino Romano, Igor Rudan, Ruslan Ruizbakiev, Hovhannes Sahakyan, Antonio Salas, Elena B. Starikovskaya, Ayele Tarekegn, Draga Toncheva, Shahlo Turdikulova, Ingrida Uktveryte, Olga Utevska, Mikhail Voevoda, Joachim Wahl, Pierre Zalloua, Levon Yepiskoposyan, Tatijana Zemunik, Alan Cooper, Cristian Capelli, Mark G. Thomas, Sarah A. Tishkoff, Lalji Singh, Kumarasamy Thangaraj, Richard Villems, David Comas, Rem Sukernik, Mait Metspalu, Matthias Meyer, Evan E. Eichler, Joachim Burger, Montgomery Slatkin, Svante Pääbo, Janet Kelso, David Reich, Johannes Krause

Analysis of ancient DNA can reveal historical events that are difficult to discern through study of present-day individuals. To investigate European population history around the time of the agricultural transition, we sequenced complete genomes from a ~7,500 year old early farmer from the Linearbandkeramik (LBK) culture from Stuttgart in Germany and an ~8,000 year old hunter-gatherer from the Loschbour rock shelter in Luxembourg. We also generated data from seven ~8,000 year old hunter-gatherers from Motala in Sweden. We compared these genomes and published ancient DNA to new data from 2,196 samples from 185 diverse populations to show that at least three ancestral groups contributed to present-day Europeans. The first are Ancient North Eurasians (ANE), who are more closely related to Upper Paleolithic Siberians than to any present-day population. The second are West European Hunter-Gatherers (WHG), related to the Loschbour individual, who contributed to all Europeans but not to Near Easterners. The third are Early European Farmers (EEF), related to the Stuttgart individual, who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model the deep relationships of these populations and show that about ~44% of the ancestry of EEF derived from a basal Eurasian lineage that split prior to the separation of other non-Africans.

The causal meaning of genomic predictors and how it affects the construction and comparison of genome-enabled selection models

The causal meaning of genomic predictors and how it affects the construction and comparison of genome-enabled selection models
Bruno D Valente, Gota Morota, Guilherme JM Rosa, Daniel Gianola, Kent Weigel

The additive genetic effect is arguably the most important quantity inferred in animal and plant breeding analyses. The term effect indicates that it represents causal information, which is different from standard statistical concepts as regression coefficient and association. The process of inferring causal information is also different from standard statistical learning, as the former requires causal (i.e. non-statistical) assumptions and involves extra complexities. Remarkably, the task of inferring genetic effects is largely seen as a standard regression/prediction problem, contradicting its label. This widely accepted analysis approach is by itself insufficient for causal learning, suggesting that causality is not the point for selection. Given this incongruence, it is important to verify if genomic predictors need to represent causal effects to be relevant for selection decisions, especially because applying regression studies to answer causal questions may lead to wrong conclusions. The answer to this question defines if genomic selection models should be constructed aiming maximum genomic predictive ability or aiming identifiability of genetic causal effects. Here, we demonstrate that selection relies on a causal effect from genotype to phenotype, and that genomic predictors are only useful for selection if they distinguish such effect from other sources of association. Conversely, genomic predictors capturing non-causal signals provide information that is less relevant for selection regardless of the resulting predictive ability. Focusing on covariate choice decision, simulated examples are used to show that predictive ability, which is the criterion normally used to compare models, may not indicate the quality of genomic predictors for selection. Additionally, we propose using alternative criteria to construct models aiming for the identification of the genetic causal effects.

The availability of research data declines rapidly with article age

The availability of research data declines rapidly with article age
Timothy Vines, Arianne Albert, Rose Andrew, Florence Débarre, Dan Bock, Michelle Franklin, Kimberly Gilbert, Jean-Sébastien Moore, Sébastien Renaut, Diana J. Rennison
(Submitted on 19 Dec 2013)

Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2-4], and journal [5,6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8-11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested datasets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a dataset being extant fell by 17% per year. In addition, the odds that we could find a working email address for the first, last or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.

Author post: Dynamic DNA Processing: A Microcode Model of Cell Differentiation

The following guest post is by Barry Jacobson on his preprint “Dynamic DNA Processing: A Microcode Model of Cell Differentiation”, arXived here.

The paper suggests that DNA should be viewed as a processor that operates by means of base-pairing with remote regions of the genome. If one sequence matches another (or is complementary to it) it will set up a structural loop, or other interaction. However, the paper postulates that at least one region of the genome of every cell will have a unique clock sequence that is shared by no other cell. Therefore, the clock of one cell may not match the same distant sequences as the clock of another. Thus, the pattern of loops that is formed, and the overall 3-D DNA structure, may differ from cell to cell. This will either assist or hinder binding of transcription factors in one type of cell, as compared to another, thus providing a mechanism of differential gene expression.

We discuss a method by how these differing clock sequences could be generated in cell division, so that the daughters each end up with a unique identifier. The identifier then unlocks certain conformations only for those cell types for which it is relevant. Similarly, SNP’s may function in a similar manner, by modifying 3-D configurations, thus altering TF activity.

We further postulate that if a clock or target is errantly mutated, so that it matches a target farther away than was intended, it may stretch the chromosome to the breaking point, and this is the cause of chromosomal breakage or translocations in cancer.

Finally, we allow for the possibility that a cell can modify its clock in response to the environment, such as when healing from trauma, or accepting a graft, in which case it needs to coordinate with neighboring cells. We suggest that perhaps chemical analogs of cell surface proteins may occasionally mistrigger such a clock modification, when none is necessary, and thereby cause incorrect matches and conformations in that cell, which can damage DNA, and lead to cancer, as before.

We realize this is all purely speculative, but we mention that we originally submitted this model to Nature without success 16 years ago, and since then, a number of its assumptions have been verified, as detailed in the recent submission to arXiv, therefore we believe it deserves a second look.

Massively differential bias between two widely used Illumina library preparation methods for small RNA sequencing

Massively differential bias between two widely used Illumina library preparation methods for small RNA sequencing

Jeanette Baran-Gale, Michael R Erdos, Christina Sison, Alice Young, Emily E Fannin, Peter S Chines, Praveen Sethupathy

Recent advances in sequencing technology have helped unveil the unexpected complexity and diversity of small RNAs. A critical step in small RNA library preparation for sequencing is the ligation of adapter sequences to both the 5’ and 3’ ends of small RNAs. Two widely used protocols for small RNA library preparation, Illumina v1.5 and Illumina TruSeq, use different pairs of adapter sequences. In this study, we compare the results of small RNA-sequencing between v1.5 and TruSeq and observe a striking differential bias. Nearly 100 highly expressed microRNAs (miRNAs) are >5-fold differentially detected and 48 miRNAs are >10-fold differentially detected between the two methods of library preparation. In fact, some miRNAs, such as miR-24-3p, are over 30-fold differentially detected. The results are reproducible across different sequencing centers (NIH and UNC) and both major Illumina sequencing platforms, GAIIx and HiSeq. While some level of bias in library preparation is not surprising, the apparent massive differential bias between these two widely used adapter sets is not well appreciated. As increasingly more laboratories transition to the newer TruSeq-based library preparation for small RNAs, researchers should be aware of the extent to which the results may differ from previously published results using v1.5.