Phylogenetic analysis accounting for age-dependent death and sampling with applications to epidemics

Phylogenetic analysis accounting for age-dependent death and sampling with applications to epidemics
Amaury Lambert, Helen K. Alexander, Tanja Stadler
(Submitted on 14 Jun 2013)

The reconstruction of phylogenetic trees based on viral genetic sequence data sequentially sampled from an epidemic provides estimates of the past transmission dynamics, by fitting epidemiological models to these trees. To our knowledge, none of the epidemiological models currently used in phylogenetics can account for recovery rates and sampling rates dependent on the time elapsed since transmission.
Here we introduce an epidemiological model where infectives leave the epidemic, either by recovery or sampling, after some random time which may follow an arbitrary distribution.
We derive an expression for the likelihood of the phylogenetic tree of sampled infectives under our general epidemiological model. The analytic concept developed in this paper will facilitate inference of past epidemiological dynamics and provide an analytical framework for performing very efficient simulations of phylogenetic trees under our model. The main idea of our analytic study is that the non-Markovian epidemiological model giving rise to phylogenetic trees growing vertically as time goes by, can be represented by a Markovian “coalescent point process” growing horizontally by the sequential addition of pairs of coalescence and sampling times.
As examples, we discuss two special cases of our general model, namely an application to influenza and an application to HIV. Though phrased in epidemiological terms, our framework can also be used for instance to fit macroevolutionary models to phylogenies of extant and extinct species, accounting for general species lifetime distributions.

Interfertile oaks in an island environment: I. High nuclear genetic differentiation and high degree of chloroplast DNA sharing between Q. alnifolia and Q. coccifera in Cyprus. A multipopulation study

Interfertile oaks in an island environment: I. High nuclear genetic differentiation and high degree of chloroplast DNA sharing between Q. alnifolia and Q. coccifera in Cyprus. A multipopulation study
Charalambos Neophytou, Aikaterini Dounavi, Siegfried Fink, Filippos A. Aravanopoulos
(Submitted on 11 Jun 2013)

The evergreen Quercus alnifolia and Q. coccifera form the only interfertile pair of oak species growing in Cyprus. Hybridization between the two species has already been observed and studied morphologically. However, little evidence exists about the extent of genetic introgression. In the present study, we aimed to study the effects of introgressive hybridization mutually on both chloroplast and nuclear genomes. We sampled both pure and mixed populations of Q. alnifolia and Q. coccifera from several locations across their distribution area in Cyprus. We analyzed the genetic variation within and between species by conducting Analysis of Molecular Variance (AMOVA) based on nuclear microsatellites. Population genetic structure and levels of admixture were studied by means of a Bayesian analysis (STRUCTURE simulation analysis). Chloroplast DNA microsatellites were used for a spatial analysis of genetic barriers. The main part of the nuclear genetic variation was explained by partition into species groups. High interspecific differentiation and low admixture of nuclear genomes, both in pure and mixed populations, support limited genetic introgression between Q. alnifolia and Q. coccifera in Cyprus. On the contrary, chloroplast DNA haplotypes were shared between the species and were locally structured suggesting cytoplasmic introgression. Occasional hybridization events followed by backcrossings with both parental species might lead to this pattern of genetic differentiation.

Interfertile oaks in an island environment. II. Limited hybridization between Quercus alnifolia Poech and Q. coccifera L. in a mixed stand

Interfertile oaks in an island environment. II. Limited hybridization between Quercus alnifolia Poech and Q. coccifera L. in a mixed stand
Charalambos Neophytou, Filippos A. Aravanopoulos, Siegfried Fink, Aikaterini Dounavi
(Submitted on 11 Jun 2013)

Hybridization and introgression between Quercus alnifolia Poech and Q. coccifera L. is studied by analyzing morphological traits, nuclear and chloroplast DNA markers. The study site is a mixed stand on Troodos Mountains (Cyprus) and the analyzed material includes both adult trees and progenies of specific mother trees. Multivariate analysis of morphological traits shows that the two species can be well distinguished using simple leaf morphometric parameters. A lower genetic diversity in Q. alnifolia than in Q. coccifera and a high interspecific differentiation between the two species are supported by an analysis of nuclear and chloroplast microsatellites. The intermediacy of the four designated hybrids is verified by both leaf morphometric and genetic data. Analysis of progeny arrays provides evidence that interspecific crossings are rare. This finding is further supported by limited introgression of chloroplast genomes. Reproductive barriers (e.g. asynchronous phenology, post-zygotic incompatibilities) might account for this result. A directionality of interspecific gene flow is indicated by a genetic assignment analysis of effective pollen clouds with Q. alnifolia acting as pollen donor. Differences in flowering phenology and species distribution in the stand may have influenced the direction of gene flow and the genetic differentiation among effective pollen clouds of different mother trees within species.

Upper Rhine Valley: A migration crossroads of middle European oaks

Upper Rhine Valley: A migration crossroads of middle European oaks
Charalambos Neophytou, Hans-Gerhard Michiels
(Submitted on 10 Jun 2013)

The indigenous oak species (Quercus spp.) of the Upper Rhine Valley have migrated to their current distribution range in the area after the transition to the Holocene interglacial. Since post-glacial recolonization, they have been subjected to ecological changes and human impact. By using chloroplast microsatellite markers (cpSSRs), we provide detailed phylogeographic information and we address the contribution of natural and human-related factors to the current pattern of chloroplast DNA (cpDNA) variation. 626 individual trees from 86 oak stands including all three indigenous oak species of the region were sampled. In order to verify the refugial origin, reference samples from refugial areas and DNA samples from previous studies with known cpDNA haplotypes (chlorotypes) were used. Chlorotypes belonging to three different maternal lineages, corresponding to the three main glacial refugia, were found in the area. These were spatially structured and highly introgressed among species, reflecting past hybridization which involved all three indigenous oak species. Site condition heterogeneity was found among groups of populations which differed in terms of cpDNA variation. This suggests that different biogeographic subregions within the Upper Rhine Valley were colonized during separate post-glacial migration waves. Genetic variation was higher in Quercus robur than in Quercus petraea, which is probably due to more efficient seed dispersal and the more pronounced pioneer character of the former species. Finally, stands of Q. robur established in the last 70 years were significantly more diverse, which can be explained by the improved transportation ability of seeds and seedlings for artificial regeneration of stands during this period.

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers
Charalambos Neophytou, Filippos A. Aravanopoulos, Siegfried Fink, Aikaterini Dounavi
(Submitted on 10 Jun 2013)

Genetic analysis was carried out in order to provide insights into differentiation among populations of two interfertile oak species, Quercus petraea and Quercus robur. Gene flow between the two species, local adaptations and speciation processes in general, may leave differential molecular signatures across the genome. Three interspecific pairs of natural populations from three ecologically different regions, one in central Europe (SW Germany) and two in the Balkan Peninsula (Greece and Bulgaria) were sampled. Grouping of highly informative SSR loci was made according to the component of variation they express – interspecific or provenance specific. Species and provenance discriminant loci were characterized based on FSTs. Locus specific FSTs were tested for deviation from the neutral expectation both within and between species. Data were then treated separately in a Bayesian analysis of genetic structure. By using three species discriminant loci, high membership probability to inferred species groups was achieved. On the other hand, analysis of genetic structure based on five provenance discriminant loci was correlated with geographic region and revealed shared genetic variation between neighbouring Q. petraea and Q. robur. Small sets of highly variable nuclear SSRs were sufficient to discriminate, either between species or between provenances. Thus, an effective tool is provided for molecular identification of both species and provenances. Furthermore, data suggest that a combination of gene flow and natural selection forms these diversity patterns. Species discriminant loci might represent genome regions affected by directional selection, which maintains species identity. Provenance specific loci might represent genome regions with high interspecific gene flow and common adaptive patterns to local environmental factors.

On the accumulation of deleterious mutations during range expansions

On the accumulation of deleterious mutations during range expansions
Stephan Peischl, Isabelle Dupanloup, Mark Kirkpatrick, Laurent Excoffier
(Submitted on 7 Jun 2013)

We investigate the effect of spatial range expansions on the evolution of fitness when beneficial and deleterious mutations co-segregate. We perform individual-based simulations of a uniform linear habitat and complement them with analytical approximations for the evolution of mean fitness at the edge of the expansion. We find that deleterious mutations accumulate steadily on the wave front during range expansions, thus creating an expansion load. Reduced fitness due to the expansion load is not restricted to the wave front but occurs over a large proportion of newly colonized habitats. The expansion load can persist and represent a major fraction of the total mutation load thousands of generations after the expansion. Our results extend qualitatively and quantitatively to two-dimensional expansions. The phenomenon of expansion load may explain growing evidence that populations that have recently expanded, including humans, show an excess of deleterious mutations. To test the predictions of our model, we analyze patterns of neutral and non-neutral genetic diversity in humans and find an excellent fit between theory and data.

Reconstructing the Population Genetic History of the Caribbean

Reconstructing the Population Genetic History of the Caribbean
Andres Moreno-Estrada, Simon Gravel, Fouad Zakharia, Jacob L. McCauley, Jake K. Byrnes, Christopher R. Gignoux, Patricia A. Ortiz-Tello, Ricardo J. Martinez, Dale J. Hedges, Richard W. Morris, Celeste Eng, Karla Sandoval, Suehelay Acevedo-Acevedo, Juan Carlos Martinez-Cruzado, Paul J. Norman, Zulay Layrisse, Peter Parham, Esteban Gonzalez Burchard, Michael L. Cuccaro, Eden R. Martin, Carlos D. Bustamante
(Submitted on 3 Jun 2013)

The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, by making use of genome-wide SNP array data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures in present day Afro-Caribbean genomes and shedding light on the genetic impact of the dynamics occurring during the slave trade in the Caribbean.

Genome Sequencing Highlights Genes Under Selection and the Dynamic Early History of Dogs

Genome Sequencing Highlights Genes Under Selection and the Dynamic Early History of Dogs
Adam H. Freedman, Rena M. Schweizer, Ilan Gronau, Eunjung Han, Diego Ortega-Del Vecchyo, Pedro M. Silva, Marco Galaverni, Zhenxin Fan, Peter Marx, Belen Lorente-Galdos, Holly Beale, Oscar Ramirez, Farhad Hormozdiari, Can Alkan, Carles Vilà, Kevin Squire, Eli Geffen, Josip Kusak, Adam R. Boyko, Heidi G. Parker, Clarence Lee, Vasisht Tadigotla, Adam Siepel, Carlos D. Bustamante, Timothy T. Harkins, Stanley F. Nelson, Elaine A. Ostrander, Tomas Marques-Bonet, Robert K. Wayne, John Novembre
(Submitted on 31 May 2013)

To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we analyzed novel high-quality genome sequences of three gray wolves, one from each of three putative centers of dog domestication, two ancient dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. We find dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow, which confounds previous inferences of dog origins. In dogs, the domestication bottleneck was severe involving a 17 to 49-fold reduction in population size, a much stronger bottleneck than estimated previously from less intensive sequencing efforts. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was far larger than represented by modern wolf populations. Conditional on mutation rate, we narrow the plausible range for the date of initial dog domestication to an interval from 11 to 16 thousand years ago. This period predates the rise of agriculture, implying that the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that surprisingly, none of the extant wolf lineages from putative domestication centers are more closely related to dogs, and the sampled wolves instead form a sister monophyletic clade. This result, in combination with our finding of dog-wolf admixture during the process of domestication, suggests a re-evaluation of past hypotheses of dog origin is necessary. Finally, we also detect signatures of selection, including evidence for selection on genes implicated in morphology, metabolism, and neural development. Uniquely, we find support for selective sweeps at regulatory sites suggesting gene regulatory changes played a critical role in dog domestication.

Our paper: Genetic recombination is targeted towards gene promoter regions in dogs

This guest post is by Adam Auton (@adamauton) on his paper (along with coauthors) Genetic recombination is targeted towards gene promoter regions in dogs arXived here.

In this paper, we investigate the age-old question of how meiotic recombination is distributed in the genome of dogs. Before you stop reading, I’d like to spend a couple of paragraphs explaining why this is an interesting topic.

Recombination in mammalian genomes tends to occur in highly localized regions known as recombination hotspots. There are probably about 30,000 or so recombination hotspots in the human genome, each of which are about 2kb wide with recombination rates that can be thousands of times that of the surrounding region. Until a few years ago, the mechanism by which recombination hotspots are localized was largely unknown. This all began to change with the discovery of PRDM9 as the gene responsible for localizing hotspots [1-3]. The role of PRDM9 is to recognize and bind to specific DNA motifs in the genome, which are subsequently epigenetically marked as preferred locations of recombination.

PRDM9 turns out to be quite a fascinating gene. There is extensive variation in PRDM9 both within and across species, which points to strong selective pressures. Importantly, variation in PRDM9 can alter the recognized DNA motifs, thereby altering the locations of recombination hotspots in the genome. The high level of variation in PRDM9 between species appears to explain why recombination hotspots tend to not be shared between even closely related species, such as human and chimpanzees.

We’ve learnt much about the importance of PRDM9 from studies in mice. Knock-out of Prdm9 in mice results in infertility and, most interestingly of all, certain alleles of mouse Prdm9 appear to be incompatible with each other [4,5]. Specifically, Mus m. musculus / Mus m. domesticus hybrid male mice are infertile if they are heterozygotic for specific Prdm9 alleles. As such, Prdm9 has been called a ‘speciation gene’, as it has the potential to restrict gene flow between nascent species, and is the only known such example in mammals.

Given this importance, it was therefore surprising to note that dogs, uniquely amongst mammals, appear to carry a dysfunctional version of PRDM9 [6]. This therefore begs the question of how recombination occurs in dogs, and provides the motivation for our paper.

Estimating recombination rates directly is challenging and costly, as only a few dozen events occur during any given meiosis. Therefore, to characterize large numbers of recombination events on a genome-wide basis, large pedigrees need to be genotyped, which can be both laborious and costly to do in non-model organisms. Luckily, an experiment of this nature has been previously performed in dogs, which revealed a recombination landscape that was reasonably consistent with patterns observed in other mammals [7].

However, without enormous sample sizes, such methods can only investigate patterns at scales far greater than the scale of individual hotspots. In order to investigate fine-scale patterns on a genome-wide basis, one must turn to indirect statistical methods, and it is this approach that we have adopted in our study. First, we whole-genome sequenced a collection of 51 outbred dogs and used this data to call single nucleotide polymorphisms. Having done so, we used the statistical method, LDhat, which infers historical recombination rates via analysis of patterns of linkage disequilibrium. This is a similar approach that adopted by Axelsson et al. [8], who used microarrays to gain strong insights into canine recombination, although our use of sequencing allows us to investigate patterns at a much finer scale.

Our results agree nicely with the broad-scale experimental estimates, but reveal a quite unusual landscape at the fine scale. In particular, we find that canine recombination is strongly enriched in regions with high CpG content. As such, recombination rates are very high around the CpG-rich regions associated with gene promoters, and contrasts with other mammalian species in which recombination hotspots do not show any particularly strong affinity for gene promoter regions. However, it is also reminiscent of patterns seen in Prdm9 knock-out mice which, although infertile, still produce double-strand breaks that cluster in gene promoter regions [9].

Interestingly, the dog genome is known to have very high CpG content. It has previously been suggested that one potential mechanism by which this may have occurred is biased gene conversion, which can result in the preferential transmission of G-C alleles over A-T alleles in the vicinity of recombination events. To investigate this phenomenon, we also sequenced a related fox species, which allowed us to see if G-C alleles are being gained or lost around recombination hotspots. We see that dog recombination hotspots do indeed appear to be acquiring GC content. This could imply a runaway process, by which CpG-rich regions have become recombinogenic, and hence have started to acquire more GC content, and hence become more recombinogenic.

As such, our results show that recombination in the dog genome appears to have some quite interesting properties. However, questions remain. The loss of PRDM9 in dogs appears to have resulted in some qualitative features that are consistent with knock-out mice, and yet dogs somehow avoid the associated infertility. Perhaps canine meiosis manages to complete without a PRDM9 ortholog, or perhaps an as-yet-unknown gene in the dog genome has adopted the role of PRDM9. In either case, the investigation of recombination in dogs provides a valuable means for building our understanding of how recombination occurs and its importance in shaping the genome.

1. Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, et al. (2010) PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science 327: 836-840.
2. Myers S, Bowden R, Tumian A, Bontrop RE, Freeman C, et al. (2010) Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science 327: 876-879.
3. Parvanov ED, Petkov PM, Paigen K (2010) Prdm9 controls activation of mammalian recombination hotspots. Science 327: 835.
4. Flachs P, Mihola O, Simecek P, Gregorova S, Schimenti JC, et al. (2012) Interallelic and intergenic incompatibilities of the Prdm9 (Hst1) gene in mouse hybrid sterility. PLoS Genet 8: e1003044.
5. Mihola O, Trachtulec Z, Vlcek C, Schimenti JC, Forejt J (2009) A mouse speciation gene encodes a meiotic histone H3 methyltransferase. Science 323: 373-375.
6. Oliver PL, Goodstadt L, Bayes JJ, Birtle Z, Roach KC, et al. (2009) Accelerated evolution of the Prdm9 speciation gene across diverse metazoan taxa. PLoS Genet 5: e1000753.
7. Wong AK, Ruhe AL, Dumont BL, Robertson KR, Guerrero G, et al. (2010) A comprehensive linkage map of the dog genome. Genetics 184: 595-605.
8. Axelsson E, Webster MT, Ratnakumar A, Ponting CP, Lindblad-Toh K (2012) Death of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res 22: 51-63.
9. Brick K, Smagulova F, Khil P, Camerini-Otero RD, Petukhova GV (2012) Genetic recombination is directed away from functional genomic elements in mice. Nature 485: 642-645.

Genetic recombination is targeted towards gene promoter regions in dogs

Genetic recombination is targeted towards gene promoter regions in dogs
Adam Auton, Ying Rui Li, Jeffrey Kidd, Kyle Oliveira, Julie Nadel, J. Kim Holloway, Jessica J. Howard, Paula E. Cohen, John M. Greally, Jun Wang, Carlos D. Bustamante, Adam R. Boyko
(Submitted on 28 May 2013)

The identification of the H3K4 trimethylase, PRDM9, as the gene responsible for recombination hotspot localization has provided considerable insight into the mechanisms by which recombination is initiated in mammals. However, uniquely amongst mammals, canids appear to lack a functional version of PRDM9 and may therefore provide a model for understanding recombination that occurs in the absence of PRDM9, and thus how PRDM9 functions to shape the recombination landscape. We have constructed a fine-scale genetic map from patterns of linkage disequilibrium assessed using high-throughput sequence data from 51 free-ranging dogs, Canis lupus familiaris. Compared to genetic maps obtained in other mammalian species, the canine map is notably different at the fine-scale. While broad-scale patterns exhibit typical properties, our fine-scale estimates indicate that recombination is more uniformly distributed than has been observed in other mammalian species. In addition, highly elevated recombination rates are observed in the vicinity of CpG rich regions including gene promoter regions, but show little association with H3K4 trimethylation marks identified in spermatocytes. Finally, by comparison to genomic data from the Andean fox, Lycalopex culpaeus, we show that biased gene conversion is a plausible mechanism by which the high CpG content of the dog genome could have occurred.