I (@joe_pickrell) was recently asked to review a preprint by Decker et al., Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle for a journal. Below are the comments I sent the journal.
In this paper, the authors apply a suite of population genetics analyses to a set of cattle breeds. The basic data consists of around 1,500 individuals from 143 breeds typed at around 40,000 SNPs. The authors use this data to build population trees/graphs using TreeMix and visualize population structure with PCA/ADMIXTURE. They then interpret the results of these programs in light of their knowledge of the history of cattle domestication. I had no knowledge of cattle history prior to reading this manuscript, so I enjoyed reading it. I have first a few comments on the manuscript as a whole, then on individual points.
1. A lot of interpretation depends on the robustness of the inferred population graph from TreeMix. It would be extremely helpful to see that the estimated graph is consistent across different random starting points. The authors could run TreeMix, say, five different times, and compare the results across runs. I expect that many of the inferred migration edges will be consistent, but a subset will not. It’s probably most interesting to focus interpretation on the edges that are consistent.
2. Throughout the manuscript, inference from genetics is mixed in with evidence from other sources. At points it sometimes becomes unclear which points are made strictly from genetics and which are not. For example, the authors write, “Anatolian breeds are admixed between European, African, and Asian cattle, and do not represent the populations originally domesticated in the region”. It seems possible that the first part of that statement (about admixture) could be their conclusion from the genetic data, but it’s difficult to make the second statement (about the original populations in the region) from genetics, so presumably this is based on other sources. In general, I would suggest splitting the results internal to this paper apart from the other statements and making a clear firewall between their results and the historical interpretation of the results (right now the authors have a “Results and Discussion” section, but it might be easiest to do this by splitting the “Results” from the “Discussion”. But this is up to the authors.).
3. Related to the above point, could the authors add subsection headings to the results/discussion section? Right now the topic of the paper jumps around considerably from paragraph to paragraph, and at points I had difficulty following. One possibility would be to organize subheading by the claims made in the abstract, e.g. “Cline of indicine introgression into Africa”, “wild African auroch ancestry”, etc…
There are quite a few results claimed in this paper, so I’m going to split my comments apart by the results reported in the abstract. As mentioned above, it would be nice if the authors clearly stated exactly which pieces of evidence they view as supporting each of these, perhaps in subheadings in the Results section. In italics is the relevant sentence in the abstract, followed by my thoughts:
Using 19 breeds, we map the cline of indicine introgression into Africa.
This claim is based on interpretation of the ADMIXTURE plot in Figure 5. I wonder if a map might make this point more clearly than Figure 5, however; the three-letter population labels in Figure 5 are not very easy to read, especially since most readers will have no knowledge of the geographic locations of these breeds.
We infer that African taurine possess a large portion of wild African auroch ancestry, causing their divergence from Eurasian taurine.
This claim appears to be largely based on the interpretation of the treemix plot in Figure 4. This figure shows an admixture edge from the ancestors of the European breeds into the African breeds. As noted above, it seems important that this migration edge be robust across different treemix runs. Also, labeling this ancestry as “wild African auroch ancestry” seem like an interpretation of the data rather than something that has been explicitly tested, since the authors don’t have wild African aurochs in their data.
Additionally, the authors claim that this result shows “there was not a third domestication process, rather there was a single origin of domesticated taurine…”. I may be missing something, but it seems that genetic data cannot distinguish whether a population was “domesticated” or “wild”. That is, it seems plausible that the source population tentatively identified in Figure 4 may have been independently domesticated. There may be other sources of evidence that refute this interpretation, but this is another example of where it would be useful to have a firewall between the genetic results and the interpretation in light of other evidence. The speculation about the role of disease resistance in introgression is similarly not based on evidence from this paper and should probably be set apart.
We detect exportation patterns in Asia and identify a cline of Eurasian taurine/indicine hybridization in Asia.
The cline of taurine/indicine hybridization is based on interpretation of ADMIXTURE plots and some follow-up f4 statistics. I found this difficult to follow, especially since a significant f4 statistic can have multiple interpretations. Perhaps the authors could draw out the proposed phylogeny for these breeds and explain the reasons they chose particular f4 statistics to highlight.
We also identify the influence of species other than Bos taurus in the formation of Asian breeds.
The conclusion that other species other than Bos taurus have introgressed into Asian breeds seems to be based on interpretation of branch lengths in the trees in Figures 2-3 and some f3 statistics. The interpretation of branch lengths is extremely weak evidence for introgression, probably not even worth mentioning. The f3 statistics are potentially quite informative though. For the breeds in question (Brebes and Madura), which pairs of populations give the most negative f3 statstics? This is difficult information to extract from Supplementary Table 2, where the populations appear to be sorted alphabetically. A table showing the (for example) five most negative f3 statistics could be quite useful here. In general, if the SNP ascertainment scheme is not extremely complicated (can the authors describe the ascertainment scheme for this array?), a negative f3 statistic is very strong evidence that a target population is admixed, which a significant f4 statistic only means that at least one of the four populations in the statistic is admixed. This might be a useful property for the authors.
We detect the pronounced influence of Shorthorn cattle in the formation of European breeds.
This conclusion appears to be based on interpretation of ADMIXTURE plots in Figures S6-S9. Interpreting these types of plots is notoriously difficult. I wonder if the f3 statistics might be useful here: do the authors get negative f3 statistics in the populations they write “share ancestry with Shorthorn cattle” when using the Durham shorthorns as one reference?
Iberian and Italian cattle possess introgression from African taurine.
This conclusion is based on ADMIXTURE plots and treemix; it would be interesting to see the results from f3 statistics as well.
American Criollo cattle are shown to be of Iberian, and not African, decent.
I found this difficult to follow–the authors write that these breeds “derive 7.5% of their ancestry from African taurine introgression”, so presumably they are in fact partially of African descent?
Indicine introgression into American cattle occurred in the Americas, and not Europe
This conclusion seems difficult to make from genetic data. The authors identify “indicine” ancestry in American cattle, so I don’t see how they can determine whether this happened before or after a migration without temporal information. It would be helpful if the authors walk the reader through each logical step they’re making so that the reader can decide whether they believe the evidence for each step.