Author post: Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle

This guest post is by Jared Decker on his preprint (with colleagues) “Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle“, arXived here. The post is a response to the review posted by Joe Pickrell here.

I have posted an updated version of my preprint on arXiv. Because Joe Pickrell posted his review of my preprint “Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle” on Haldane’s Sieve, I thought readers might enjoying seeing my response. I have really enjoyed having the process open to the public.

Reviewers comments are in blue.
My comments are in black.
Quotes from the manuscript are in Arial font.

Reviewer #1 [Joe Pickrell]

Overall comments:

1. A lot of interpretation depends on the robustness of the inferred population graph from TreeMix. It would be extremely helpful to see that the estimated graph is consistent across different random starting points. The authors could run TreeMix, say, five different times, and compare the results across runs. I expect that many of the inferred migration edges will be consistent, but a subset will not. Itís probably most interesting to focus interpretation on the edges that are consistent.

We followed Reviewer #1ís recommendation and have included 6 phylogenetic networks (the original network and 5 replicates) as supplementary Figure S4. The admixed histories of several of the sample populations are quite complex, and as seen in Figure S4, the same relationships can be represented multiple ways. For example if population A is admixed between populations B, C, and D, it can be placed sister to population B with migration edges from C and D, or it can be placed sister to C with migration edges from B and D. We have tried to note in the manuscript when migration edges are not consistent. But, one of the main points of the paper, introgression for an ancestral population into the African taurine clade is consistent across all replicates.

To the third paragraph of the Admixture in Europe subsection we added, “The placement of Italian breeds is not consistent across independent TreeMix runs (Figure S4), likely due to their complicated history of admixture.”

In the second to last paragraph of the manuscript we state. “In TreeMix replicates, Texas Longhorn and Romosinuano are either sister to admixed Anatolian breeds or they receive a migration edge that originates near Brahman (Figure S4).

2. Throughout the manuscript, inference from genetics is mixed in with evidence from other sources. At points it sometimes becomes unclear which points are made strictly from genetics and which are not.

We have edited the manuscript by adding citations to clarify which inference is from genetics and which is from previous studies or breed histories.

For example, the authors write, “Anatolian breeds are admixed between European, African, and Asian cattle, and do not represent the populations originally domesticated in the region”. It seems possible that the first part of that statement (about admixture) could be their conclusion from the genetic data, but itís difficult to make the second statement (about the original populations in the region) from genetics, so presumably this is based on other sources.

We edited this sentence to say, “Anatolian breeds (AB, EAR, TG, ASY, and SAR) are admixed between blue European-like, grey African-like, and green indicine-like cattle (Figures 5 and 6), and we infer they do not represent the taurine populations originally domesticated in this region due to a history of admixture.

In general, I would suggest splitting the results internal to this paper apart from the other statements and making a clear firewall between their results and the historical interpretation of the results (right now the authors have a “Results and Discussion” section, but it might be easiest to do this by splitting the “Results” from the “Discussion”. But this is up to the authors.).

The corresponding authors of this manuscript (Decker and Taylor) prefer to have the results and discussion sections combined, so we appreciate Review #1 leaving that decision up to us. But, we recognize that he brings up a valid point and have strived to make the distinction between results and discussion clearer throughout the manuscript.

3. Related to the above point, could the authors add subsection headings to the results/discussion section? Right now the topic of the paper jumps around considerably from paragraph to paragraph, and at points I had difficulty following. One possibility would be to organize subheading by the claims made in the abstract, e.g. “Cline of indicine introgression into Africa”, “wild African auroch ancestry”, etc.

Subsection headings have been added.

Specific comments:

There are quite a few results claimed in this paper, so Iím going to split my comments apart by the results reported in the abstract. As mentioned above, it would be nice if the authors clearly stated exactly which pieces of evidence they view as supporting each of these, perhaps in subheadings in the Results section. In italics is the relevant sentence in the abstract, followed by my thoughts:

Using 19 breeds, we map the cline of indicine introgression into Africa.

This claim is based on interpretation of the ADMIXTURE plot in Figure 5. I wonder if a map might make this point more clearly than Figure 5, however; the three-letter population labels in Figure 5 are not very easy to read, especially since most readers will have no knowledge of the geographic locations of these breeds.

Map added as Figure 5, with previous ADMIXTURE figure as Figure 6 so that readers can still see individual breed ancestries.

“We infer that African taurine possess a large portion of wild African auroch ancestry, causing their divergence from Eurasian taurine.”

This claim appears to be largely based on the interpretation of the treemix plot in Figure 4. This figure shows an admixture edge from the ancestors of the European breeds into the African breeds. As noted above, it seems important that this migration edge be robust across different treemix runs. Also, labeling this ancestry as “wild African auroch ancestry” seem like an interpretation of the data rather than something that has been explicitly tested, since the authors don’t have wild African aurochs in their data.

This migration edge is robust across 6 different TreeMix runs. The edge is from a node that is ancestral to European, Asian, and African taurine, and this node is approximately halfway between the common ancestor of domesticated indicine and the common ancestor of domesticated taurine. African auroch are extinct. Most, if not all, bovine ancient DNA samples come from much colder climates than northern Africa. So we are unable to sample African aurochs.

But, we feel it is a strength of the TreeMix analysis to identify introgression from ancestral populations that have not been sampled. We feel the interpretation that the introgression is from African auroch is the most parsimonious explanation of our PCA, ADMIXTURE, and TreeMix results.

Additionally, the authors claim that this result shows “there was not a third domestication process, rather there was a single origin of domesticated taurine”. I may be missing something, but it seems that genetic data cannot distinguish whether a population was “domesticated” or “wild”. That is, it seems plausible that the source population tentatively identified in Figure 4 may have been independently domesticated. There may be other sources of evidence that refute this interpretation, but this is another example of where it would be useful to have a firewall between the genetic results and the interpretation in light of other evidence. The speculation about the role of disease resistance in introgression is similarly not based on evidence from this paper and should probably be set apart.

The claim that there was a single origin of domesticated taurine is based upon the topology of the phylogenetic network, as European, Asian, and African taurine all share a common ancestor, and the Asian clade is sister to the rest of the ingroup. This rules out the possibility of a separate domestication in Africa as a separate domestication would cause African domesticates to be sister to the rest of taurus. Larson and Burger (2013) do not consider admixture a separate domestication, and we choose to follow their definition. Two domestications with the resulting population in Africa a mixture of the two is not very parsimonious. The most parsimonious explanation is admixture from a wild relative.

We agree that we have not tested the influence of trypanosomiasis resistance on driving admixture, but we feel it is an interesting hypothesis that explains the force that drove admixture. We have rephrased the sentence as:

“We hypothesize that the introgression in Africa may have been driven by trypanosomiasis resistance in African auroch which may be the source of resistance in African taurine populations [48].”

“We detect exportation patterns in Asia and identify a cline of Eurasian taurine/indicine hybridization in Asia.”

The cline of taurine/indicine hybridization is based on interpretation of ADMIXTURE plots and some follow-up f4 statistics. I found this difficult to follow, especially since a significant f4 statistic can have multiple interpretations. Perhaps the authors could draw out the proposed phylogeny for these breeds and explain the reasons they chose particular f4 statistics to highlight.

We have added a map figure so that the ADMIXTURE estimates will be easier to interpret in a geographic frame. We also added, From previous research [3] and Figures 2 and 3, these relationships should be tree-like if there were no admixture. For 53 of the possible 280 tests, the Z-score was more extreme than ±2.575829. The most extreme test statistics were f4(Wagyu, Mongolian; Simmental, Shorthorn) = -0.003 (Z-score = -5.21, other rearrangements of these groups had Z-scores of 7.32 and 16.55) and f4(Hanwoo, Wagyu; Piedmontese, Shorthorn) = 0.002 (Z-score = 4.90, other rearrangements of these groups had Z-scores of 21.79 and 27.77)

While the f4 statistics do have multiple interpretations, we do feel confident that the ADMIXTURE analysis highlights which interpretation is the most likely.

“We also identify the influence of species other than Bos taurus in the formation of Asian breeds.”

The conclusion that other species other than Bos taurus have introgressed into Asian breeds seems to be based on interpretation of branch lengths in the trees in Figures 2-3 and some f3 statistics. The interpretation of branch lengths is extremely weak evidence for introgression, probably not even worth mentioning. The f3 statistics are potentially quite informative though. For the breeds in question (Brebes and Madura), which pairs of populations give the most negative f3 statistics? This is difficult information to extract from Supplementary Table 2, where the populations appear to be sorted alphabetically. A table showing the (for example) five most negative f3 statistics could be quite useful here.

Supplementary Table 2 has been updated to report the 5 most negative statistics. The Z-scores for Brebes are smaller than -18 and the Z-scores for Madura are smaller than -13. We also note that these results are supported by the ADMIXTURE analysis.

In general, if the SNP ascertainment scheme is not extremely complicated (can the authors describe the ascertainment scheme for this array?), a negative f3 statistic is very strong evidence that a target population is admixed, which a significant f4 statistic only means that at least one of the four populations in the statistic is admixed. This might be a useful property for the authors.

The SNPs were ascertained multiple ways, they were either a SNP in the reference Hereford animal, discovered from Sanger resequencing of 9 breeds, or reduced representation sequencing of Angus, Holstein, or a pool of breeds. Most of the SNPs were ascertained in Hereford, Angus, or Holstein.

“We detect the pronounced influence of Shorthorn cattle in the formation of European breeds.”

This conclusion appears to be based on interpretation of ADMIXTURE plots in Figures S6-S9. Interpreting these types of plots is notoriously difficult. I wonder if the f3 statistics might be useful here: do the authors get negative f3 statistics in the populations they write ìshare ancestry with Shorthorn cattleî when using the Durham shorthorns as one reference?

Durham Shorthorn is the ancestral breed of Beef Shorthorn, Milking Shorthorn, and Lincoln Red (reference 30 from the manuscript), and as these are direct relationships (tree-like) we wouldnít expect significant f-statistics. We added Table S3 to report the negative f3 statistics for Maine Anjou, Santa Gertrudis, and Beefmaster. We suspect Belgian Blue have undergone too much change in allele frequencies due to intense selection and small effective population sizes since admixture to produce significant f3 statistics. We have edited the sentence to say:

“As shown in Figures S6 through S9, Table S3, and from their breed histories [31], many breeds share ancestry with Shorthorn cattle, including Milking Shorthorn, Beef Shorthorn, Lincoln Red, Maine-Anjou, Belgian Blue, Santa Gertrudis, and Beefmaster.”

Charolais and Holstein did not produce significant f3 statistics. Although they did produce significant f4 statistics, we choose to not report these.

“Iberian and Italian cattle possess introgression from African taurine.”

This conclusion is based on ADMIXTURE plots and treemix; it would be interesting to see the results from f3 statistics as well.

We added this as the last paragraph of the Admixture in Europe subsection.

“We also used f-statistics to explore the evidence for African taurine introgression into Spain and Italy. We did not see any significant f3 statistics, but this test may be underpowered because of the low-level of introgression. With Italian and Spanish breeds as a sister group and African breeds, including OulmËs Zaer, as the other sister group, we see 321 significant tests out of 1911 possible tests. Of these 321 significant tests, 218 contained Oulmes Zaer. We also calculated f4 statistics with the Spanish breeds as sister and the African taurine breeds as sister (excluding Oulmes Zaer). With this setup, out of the possible 675 tests we only see 1 significant test, f4(Berrenda en Negro, Pirenaica;Lagune, N’Dama (ND2)) = 0.0007, Z-score = 3.064. With Italian cattle as sister and African taurine as sister (excluding Oulmes Zaer), we see 17 significant test out of 90 possible. Patterson et al. [27] define the f4-ratio as f4(A, O; X, C)/f4(A, O; B, C), where A and B are a sister group, C is sister to (A,B), X is a mixture of B and C, and O is the outgroup. This ratio estimates the ancestry from B, denoted as α. We calculated this ratio using Shorthorn as A, Montbeliard as B, Lagune as C, Morucha as X, and Hariana as O. We choose Shorthorn, Montbeliard, Lagune, and Hariana as they appeared the least admixed in the ADMIXTURE analyses. We choose Morucha because it is solid red with African ancestry in Figure S10. This statistic estimates that Morucha is 91.23% European (α†= 0.0180993/0.0198386) and 8.77% African, which is similar to the proportion estimated by TreeMix. The multiple f4 statistics with Italian breeds as sister and African breeds as the opposing sister support African admixture into Italy. The f4-ratio test with Morucha also supports our conclusion of African admixture in Spain.”

We understand that the f4 statistics are not as easily interpreted, but the f4-ratio seems to have a straight-forward interpretation.

“American Criollo cattle are shown to be of Iberian, and not African, decent.”

I found this difficult to follow-the authors write that these breeds “derive 7.5% of their ancestry from African taurine introgression”, so presumably they are in fact partially of African descent?

We reworded as:

“American Criollo cattle are shown to be imported from Iberia, and not directly from Africa, and African ancestry is inherited via Iberian ancestors.”

“Indicine introgression into American cattle occurred in the Americas, and not Europe”

This conclusion seems difficult to make from genetic data. The authors identify “indicine” ancestry in American cattle, so I don’t see how they can determine whether this happened before or after a migration without temporal information. It would be helpful if the authors walk the reader through each logical step they’re making so that the reader can decide whether they believe the evidence for each step.

We added this sentence:

“To reiterate, Iberian cattle do not have indicine ancestry, American Criollo breeds originated from exportations from Iberia, Brahman cattle were developed in the United States in the 1880ís [31], and American Criollo breeds carry indicine ancestry, and the introgression likely occurred from Brahman cattle.”

Other responses [NB: these are responses to comments from another reviewer, but his/her comments are not printed]:

We have attempted to make the manuscript easier to read for a wider audience, but welcome additional feedback.
NOTE TO BLOG READERS: Please send me your feed back as well! @pop_gen_JED

We have rearranged the nodes of Figure 4 and we believe it is now easier to read.

The position of the migration edges denote where in time or ancestry the migration occurred. The more basal a migration edge is placed, the
migration occurred earlier in time or from a more divergent population.

As mentioned above, the placement of the migration edges is meaningful, so we prefer to keep the information displayed in this manner. We have added a brief explanation of TreeMix to the manuscript under the TreeMix analysis paragraph of the Methods section.

The geographic origin of all the populations is given in Table S1. We have edited these two sentences to say,

We find that the Indonesian Brebes (BRE) and Madura (MAD) breeds have significant Bos javanicus (BALI) ancestry demonstrated by the short branch lengths in Figures 2-4, shared ancestry with Bali in ADMIXTURE analyses (light green in Figures S7-S9), and significant f3 statistics (Table S2). The Indonesian Pesisir and Aceh and the Chinese Hainan and Luxi breeds also have Bali ancestry (migration edge c in Figure 4, and light green in Figures S8 and S9).

We agree that the reference to Murray adds confusion and have deleted these references from the manuscript.

We add “previously suggested” to this statement to identify that these two waves have previously been inferred in the literature from archeology and genetics. We also feel that the use of “possibly” suggests that this is an interpretation and not a concrete result. In regards to the evidence to support our interpretation, we see two analyses, ADMIXTURE and TreeMix, suggesting two clades of indicine introgression.

Durham Shorthorns are the ancestors of Beef Shorthorns, Milking Shorthorns, Lincoln Reds, Belgium Blues, and Maine Anjous. We add a parenthetical with a citation to reference 31 to clarify this.

Table 1 was moved to the supplement.

One of the main assumptions and conclusions of McTavish et al is that there are no pure taurines in Africa; all cattle in Africa have indicine ancestry. Our results suggest that this is not true and pure taurines do exist in Africa. We have added, “Thus, we conclude that contrary to the assumptions and conclusions of [57] cattle with pure taurine ancestry do exist in Africa.

Added “The f3 and f4 statistics look for correlations in allele frequencies that are not compatible with a bifurcating tree; these statistics provide support for admixture in the history of the tested populations [26,27].” as the first sentence of the f3 and f4 statistics subsection in the Methods section.

If cattle were separately domesticated in Africa they would be the most divergent taurine clade. But, TreeMix finds, separate from user intervention, that the best model for the relationship between indicine, Asian taurine, African taurine, and European taurine is indicine as the outgroup, European and African taurine† as sister groups, and Asian taurine as the most divergent taurine group. I.e. (indicine,(Asian taurine, (African taurine, European taurine))). But, this model also includes admixture from an unsampled ancestral population that is approximately equally divergent from taurine and indicine. Our sampling is quite extensive and has sampled populations across Europe and Africa. But, we are unable to sample African auroch as they are extinct. Rather than separate domestication and admixture being indistinguishable, the gene frequencies suggest that there was introgression into African domesticated taurine from an ancestral population. We strongly feel the most parsimonious explanation is introgression from African auroch.

From Stock and Gifford-Gonzalez 2013, “The central fact around which disparate speculations about the origins of African cattle turn is one upon which all can also agree: northern Africa was home to wild aurochsen, Bos primigenius, from the Middle Pleistocene onwards (Linseele 2004).” We have added citations to Stock and Gifford-Gonzalez 2013 and Linseele 2004 to our manuscript.

Other changes:

Changed “elucidate” to “reveal” in Author Summary.

Second paragraph of TreeMix subsection of Methods section: Changed migration rate to migration proportion

Results section, Worldwide patterns subsection, 2nd paragraph, 6th sentence: Changed “(Figure 4)” to “(Figures 4 and 5, discussed in detail in the following subsections)”.

Results section,
Divergence within the taurine lineage subsection, 1st paragraph. Added “
We also see some runs of TreeMix placing a migration edge from Chianina cattle to Asian taurines (Figure S4).

Integrative genomics analysis identifies pericentromeric regions of human chromosomes affecting patterns of inter-chromosomal interactions

Integrative genomics analysis identifies pericentromeric regions of human chromosomes affecting patterns of inter-chromosomal interactions
Gennadi V. Glinsky
(Submitted on 10 Jan 2014)

Genome-wide analysis of distributions of densities of long-range interactions of human chromosomes with each other, nucleoli, nuclear lamina, and binding sites of chromatin state regulatory proteins, CTCF and STAT1, identifies non-random highly correlated patterns of density distributions along the chromosome length for all these features. Marked co-enrichments and clustering of all these interactions are detected at discrete genomic regions on selected chromosomes, which are located within pericentromeric heterochromatin and designated Centromeric Regions of Interphase Chromatin Homing (CENTRICH). CENTRICH manifest 199-716-fold higher density of inter-chromosomal binding sites compared to genome-wide or chromosomal averages (p = 2.10E-101-1.08E-292). Sequence alignment analysis shows that CENTRICH represent unique DNA sequences of 3.9 to 22.4 Kb in size which are: 1) associated with nucleolus; 2) exhibit highly diverse set of DNA-bound chromatin state regulators, including marked enrichment of CTCF and STAT1 binding sites; 3) bind multiple intergenic disease-associated genomic loci (IDAGL) with documented long-range enhancer activities and established links to increased risk of developing epithelial malignancies and other common human disorders. Using distances of SNP loci homing sites within genomic coordinates of CENTRICH as a proxy of likelihood of disease-linked SNP loci binding to CENTRICH, we demonstrate statistically significant correlations between the probability of SNP loci binding to CENTRICH and GWAS-defined odds ratios of increased risk of a disease for cancer, coronary artery disease, and type 2 diabetes. Our analysis suggests that centromeric sequences and pericentromeric heterochromatin may play an important role in human cells beyond the critical functions in chromosome segregation.

Palaeosymbiosis revealed by genomic fossils of Wolbachia in a strongyloidean nematode

Palaeosymbiosis revealed by genomic fossils of Wolbachia in a strongyloidean nematode
Georgios Koutsovoulos, Benjamin Makepeace, Vincent N. Tanya, Mark Blaxter
(Submitted on 10 Jan 2014)

Wolbachia are common endosymbionts of terrestrial arthropods, and are also found in nematodes: the animal-parasitic filaria, and the plant-parasite Radopholus similis. Lateral transfer of Wolbachia DNA to the host genome is common. We generated a draft genome sequence for the strongyloidean nematode parasite Dictyocaulus viviparus, the cattle lungworm. In the assembly, we identified nearly 1 Mb of sequence with similarity to Wolbachia. The fragments were unlikely to derive from a live Wolbachia infection: most were short, and the genes were disabled through inactivating mutations. Many fragments were co-assembled with definitively nematode-derived sequence. We found limited evidence of expression of the Wolbachia-derived genes. The D. viviparus Wolbachia genes were most similar to filarial strains, and strains from the host-promiscuous clade F. We conclude that D. viviparus was infected by Wolbachia in the past. Genome sequence based surveys are a powerful tool for revealing the genome archaeology of infection and symbiosis.

VCF2Networks: applying Genotype Networks to Single Nucleotide Variants data

VCF2Networks: applying Genotype Networks to Single Nucleotide Variants data
Giovanni Marco Dall’Olio, Ali R. Vahdat, Bertranpetit Jaume, Wagner Andreas, Laayouni Hafid
(Submitted on 9 Jan 2014)

Summary: Genotype networks are a method used in systems biology to study the innovability of a given phenotype, determining whether the phenotype is robust to mutations, and how do the genotypes associated to it are distributed in the genotype space. Here we developed VCF2Networks, a tool to apply this method to population genetics data, and in particular to single Nucleotide Variants data encoded in the Variant Call file Format (VCF). A complete summary of the properties of the genotype network that can be calculated by VCF2Networks is given in the Supplementary Materials 1.
Availability and Implementation: The home page of the project is this https URL . VCF2Networks is also available directly from the Python Package Index (PyPI), under the name vcf2networks.

Author post: Physical constraints determine the logic of bacterial promoter architectures

Our next guest post is by Radu Zabet on his manuscript (with co-workers) Physical constraints determine the logic of bacterial promoter architectures, arXived here

Earlier last year we explored the possibility of understanding ‘real biology’ using our stochastic simulation framework GRiP (http://logic.sysbiol.cam.ac.uk/grip). That software simulates how transcription factors (TFs) find their target sites in the genome, using a combination of three-dimensional diffusion around and one-dimensional walk on the DNA. This biophysical mechanism is quite well studied and is commonly termed ‘facilitated diffusion’. Unlike a homing missile, the trace of a TF molecule to its target site occurs somewhat erratic, and with many other factors around, even ‘traffic jams’ on the DNA seem possible (that and other interesting phenomena were subject of two other arXiv contributions we put online last year – see Haldane’s Sieve for more, https://haldanessieve.org/2013/04/09/our-paper-the-effects-of-transcription-factor-competition-on-gene-regulation/ and the two publications: http://dx.doi.org/10.3389/fgene.2013.00197 and http://dx.doi.org/10.1371/journal.pone.0073714).

Often times, TF binding sites are closely packed or even overlapping. In our latest paper, we explore how the spacing of binding sites along the DNA can influence the probability of a “TF traffic jam” occurring, and thereby influencing the length of a TF’s “commute” to its binding site (http://arxiv.org/abs/1312.7262). We notice that one of the promoter organisations that we predict would cause massive traffic jams is underrepresented among E. coli promoters, suggesting that this phenomenon may have an important biological role.

One of the most common approaches to predicting TF occupancy is statistical thermodynamics, which assumes that the system is in steady state. Here we show that under biologically relevant parameters, a TF might take longer than a cell cycle to arrive to its binding site when the promoter is organized in a “traffic jam” inducing way. Therefore, it is important to consider the dynamics of TF binding, rather than just the steady state.

Usually, transcriptional logic refers to the idea that the specific combinations of TFs that bind to a gene promoter control the expression level of that gene. We extend this notion of transcriptional logic by proposing that the response to multiple regulatory inputs can also depend on the dynamics of TF binding. In other words: not only the final combinatorial pattern, but also the order in which these sites are occupied matters. In this context, we suggest that the spatial organisation of the promoter encodes the logic, influenced by TF concentrations that modulate promoter occupancy dynamics in biologically relevant time scales.

Using computer simulations of the search process, we show that the logic of complex bacterial promoters can be explained by the combinatorial action of three commonly found basic building blocks: switches, barriers and clusters, whose characteristics we analyse in detail.

The precise spacing of TF binding sites plays a key role in our model, and we show that physically constrained promoter organizations are commonly found in bacterial genomes and are conserved.

Finally, we also developed a new web-based computational tool (faster GRiP, or fGRIP), which is able to generate the dynamics of promoter occupancy for bacterial systems. This tool is available at http://logic.sysbiol.cam.ac.uk/fgrip/

Distribution of population averaged observables in stochastic gene expression

Distribution of population averaged observables in stochastic gene expression
Bhaswati Bhattacharyya, Ziya Kalay
(Submitted on 9 Jan 2014)

Observation of phenotypic diversity in a population of genetically identical cells is often linked to the stochastic nature of chemical reactions involved in gene regulatory networks. We investigate the distribution of population averaged gene expression levels as a function of population, or sample, size for several stochastic gene expression models to find out to what extent population averaged quantities reflect the underlying mechanism of gene expression. We consider three basic gene regulation networks corresponding to transcription with and without gene state switching and translation. Using analytical expressions for the probability generating function of observables and Large Deviation Theory, we calculate the distribution and first two moments of the population averaged mRNA and protein levels as a function of model parameters, population size and number of measurements contained in a data set. We validate our results using stochastic simulations also report exact results on the asymptotic properties of population averages which show qualitative differences among different models.

Physical constraints determine the logic of bacterial promoter architectures

Physical constraints determine the logic of bacterial promoter architectures
Daphne Ezer, Nicolae Radu Zabet, Boris Adryan
(Submitted on 27 Dec 2013)

Site-specific transcription factors (TFs) bind to their target sites on the DNA, where they regulate the rate at which genes are transcribed. Bacterial TFs undergo facilitated diffusion (a combination of 3D diffusion around and 1D random walk on the DNA) when searching for their target sites. Using computer simulations of this search process, we show that the organisation of the binding sites, in conjunction with TF copy number and binding site affinity, plays an important role in determining not only the steady state of promoter occupancy, but also the order at which TFs bind. These effects can be captured by facilitated diffusion-based models, but not by standard thermodynamics. We show that the spacing of binding sites encodes complex logic, which can be derived from combinations of three basic building blocks: switches, barriers and clusters, whose response alone and in higher orders of organisation we characterise in detail. Effective promoter organizations are commonly found in the E. coli genome and are highly conserved between strains. This will allow studies of gene regulation at a previously unprecedented level of detail, where our framework can create testable hypothesis of promoter logic.

Reconstructing transmission networks for communicable diseases using densely sampled genomic data: a generalized approach

Reconstructing transmission networks for communicable diseases using densely sampled genomic data: a generalized approach
Colin J. Worby, Philip D. O’Neill, Theodore Kypraios, Julie V. Robotham, Daniela De Angelis, Edward J. P. Cartwright, Sharon J. Peacock, Ben S. Cooper
(Submitted on 8 Jan 2014)

Probabilistic reconstruction of transmission networks for communicable diseases can provide important insights into epidemic dynamics, the effectiveness of infection control measures, and contact patterns in an at-risk population. Whole genome sequencing of pathogens from multiple hosts provides an opportunity to investigate who infected whom with unparalleled resolution. We considered disease outbreaks in a community with high frequency genomic sampling, and formulated stochastic epidemic models to investigate person-to-person transmission, based on genomic and epidemiological data. Our approach, which combines a stochastic epidemic transmission model with a genetic distance model, overcomes key limitations of previous methods by providing a framework with the flexibility to allow for unobserved infection times, multiple independent introductions of the pathogen, and within-host genetic diversity, as well as allowing forward simulation. We defined two genetic models: a transmission diversity model, in which genetic diversity increases along a transmission chain, and an importation structure model, which groups isolates into genetically similar clusters. We evaluated their predictive performance using simulated data, demonstrating high sensitivity and specificity, particularly for rapidly mutating pathogens with low transmissibility. We then analyzed data collected during an outbreak of MRSA in a hospital. We identified three probable transmission events (posterior probability > 0.5) among the twenty observed cases. We estimated that genetic diversity across transmission links was approximately the same as within-host, with an expected 3.9 (95% CrI: 3.3, 4.6) single nucleotide polymorphisms between isolates. Our methodology avoids restrictive assumptions required in many analyses, and has broad applicability to epidemics with densely sampled genomic data.

Author post: Extensive Phenotypic Changes Associated with Large-scale Horizontal Gene Transfer

Our next guest post is by David Baltrus (@surt_lab) on his group’s preprint Extensive Phenotypic Changes Associated with Large-scale Horizontal Gene Transfer, posted on bioRviv here.

The function of modern pickup trucks is usually to haul heavy loads from point A to point B. However, the F-150 sitting in my driveway right now looks very different from its Model T ancestor from ~100 years ago. Over the years, as truck design has been modified and improved, all of the parts (brakes, air conditioning systems, doors, wheels, etc…) have been crafted to fit and work efficiently together. In process, each of the parts you see on a pickup truck today have been selectively co-evolving with all of the other design elements on the truck. The function of a house is to provide shelter.You can easily extend the the co-evolutionary metaphor from above to explain how different aspects of the house I live in relate to one another.

Some time ago, someone had the brilliant idea merge houses and pickup trucks into a camper. These hybrids between pickups and houses provide the functionality of being able to drive around, while also maintaining the ability to provide shelter. However, in the beginning, these hybrids likely didn’t accelerate as fast and consumed more energy and resources than unweighted pickups. They were likely a little taller than unweighted pickups, and as such might not be able to use certain bridges or tunnels. The brakes probably didn’t work as well. I can go on and on, but that would belabor the point I’m trying to make. In the beginning, if you just place two independently designed systems together Rube Goldberg style, the result will likely be functional but inefficient. Over the years, as engineers have worked to smoothly merge all of the systems of pickup and house together, campers have gotten much better at doing both jobs simultaneously.

Fig. 1: A truck-house hybrid is born. Images from Wikipedia

Fig. 1: A truck-house hybrid is born. Images from Wikipedia

Continue reading

Fast and accurate alignment of long bisulfite-seq reads

Fast and accurate alignment of long bisulfite-seq reads
Brent S. Pedersen, Kenneth Eyring, Subhajyoti De, Ivana V. Yang, David A. Schwartz
(Submitted on 6 Jan 2014)

Summary: Longer sequencing reads, with at least 200 bases per template are now common. While traditional aligners have adopted new strategies to improve the mapping of longer reads, aligners specific to bisulfite-sequencing were optimized when much shorter reads were the norm. We sought to perform the first comparison using longer reads to determine which aligners were most accurate and efficient and to evaluate a novel software tool, bwa-meth, built on a traditional mapper that supports insertions, deletions and clipped alignments. We gauge accuracy by comparing the number of on and off-target reads from a targeted sequencing project and by simulations. Availability and Implementation: The benchmarking scripts and the bwa-meth software are available at this https URL under the MIT License.