Evolution in Eggs and Phases: experimental evolution of fecundity and reproductive timing in Caenorhabditis elegans

Posted on March 4, 2016 by schraib

Bradly J Alicea

bioRxiv doi: http://dx.doi.org/10.1101/042143

To examine the role of natural selection on fecundity in a variety ofCaenorhabditis elegans genetic backgrounds, we used an experimental evolution protocol to evolve 14 distinct genetic strains over 15-20 generations. Beginning with three founder worms for each strain, we were able to generate 790 distinct genealogies, which provided information on both the effects of natural selection and the evolvability of each strain. Among these genotypes are a wildtype (N2) and a collection of mutants with targeted mutations in the daf-c, daf-d, and AMPK pathways. The overarching goal of our analysis is two-fold: to observe differences in reproductive fitness and observe related changes in reproductive timing. This yields two outcomes. The first is that the majority of selective effects on fecundity occur during the first few generations of evolution, while the negative selection for reproductive timing occurs on longer timescales. The second finding reveals that positive selection on fecundity results in positive and negative selection on reproductive timing, both of which are strain-dependent. Using a derivative of population size per generation called the reproductive carry-over (RCO) measure, it is found that the fluctuation and shape of the probability distribution may be informative in terms of developmental selection. While these consist of general patterns that transcend mutations in a specific gene, changes in the RCO measure may nevertheless be products of selection. In conclusion, we discuss the broader implications of these findings, particularly in the context of genotype-fitness maps and the role of uncharacterized mutations in individual variation and evolvability.

Demographic inference under the coalescent in a spatial continuum

Posted on March 4, 2016 by schraib

Demographic inference under the coalescent in a spatial continuum

Stephane Guindon, Hongbin Guo, David Welch

bioRxiv doi: http://dx.doi.org/10.1101/042135

Understanding population dynamics from the analysis of molecular and spatial data requires sound statistical modeling. Current approaches assume that populations are naturally partitioned into discrete demes, thereby failing to be relevant in cases where individuals are scattered on a spatial continuum. Other models predict the formation of increasingly tight clusters of individuals in space, which, again, conflicts with biological evidence. Building on recent theoretical work, we introduce a new genealogy-based inference framework that alleviates these issues. This approach effectively implements a stochastic model in which the distribution of individuals is homogeneous and stationary, thereby providing a relevant null model for the fluctuation of genetic diversity in time and space. Importantly, the spatial density of individuals in a population and their range of dispersal during the course of evolution are two parameters that can be inferred separately with this method. The validity of the new inference framework is confirmed with extensive simulations and the analysis of influenza sequences collected over five seasons in the USA.

Genotyping Allelic and Copy Number Variation in the Immunoglobulin Heavy Chain Locus

Posted on March 4, 2016 by schraib

Genotyping Allelic and Copy Number Variation in the Immunoglobulin Heavy Chain Locus

Shishi Luo, Jane A Yu, Yun S. Song

bioRxiv doi: http://dx.doi.org/10.1101/042226

The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a genomic region that is known to vary in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here, we establish a convention of representing the locus in terms of a reference panel of operationally distinguishable segments defined by hierarchical clustering. Using this reference set, we develop a pipeline that identifies copy number and allelic variation in the IGHV locus from whole-genome sequencing reads. Tests on simulated reads demonstrate that our approach is feasible and accurate for detecting the presence and absence of gene segments using reads as short as 70 bp. With reads 100 bp and longer, coverage depth can also be used to determine copy number. When applied to a family of European ancestry, our method finds new copy number variants and confirms existing variants. This study paves the way for analyzing population-level patterns of variation in the IGHV locus in larger diverse datasets and for quantitatively handling regions of copy number variation in other structurally varying and complex loci.

The regulator-executor-phenotype architecture shaped by natural selection

Posted on March 4, 2016 by schraib

The regulator-executor-phenotype architecture shaped by natural selection

Han Chen, Chung-I Wu, Xionglei He

bioRxiv doi: http://dx.doi.org/10.1101/026443

The genotype-phenotype relationships are a central focus of modern genetics. While deletion analyses have uncovered many regulatory genes of specific traits, it remains largely unknown how these regulators execute their commands through downstream genes, or executors. Here, we wish to know the number of executors for each trait, their relationships with the regulators and the role natural selection may play in shaping the regulator-executor-phenotype architecture. By analyzing ~500 morphological traits of the yeast Saccharomyces cerevisiae we found that a trait is often controlled directly by a large number of executors, the expressions of which are affected by regulators. By recruiting a set of “coordinating” regulators, natural selection helps organize the large number of executors into a small number of co-expression modules. This way, the individual executors can be readily recognized by observational approaches that examine the statistical association between gene activity and trait. When the trait is subject to little or no selection, however, the executors are controlled only by “non-coordinating” regulators that evolve passively and do not build the executors’ co-expression. As a result, none of the executors remain a statistically tractable relationship with the trait. Thus, natural selection by governing some traits strongly (such as fertility) and others weakly (such as aging-related phenotypes) profoundly influences the genotype-phenotype relationships as well as their tractability.

Powerful decomposition of complex traits in a diploid model using Phased Outbred Lines

Posted on March 4, 2016 by schraib

Powerful decomposition of complex traits in a diploid model using Phased Outbred Lines

Johan Hallin, Kaspar Martens, Alexander Young, Martin Zackrisson, Francisco Salinas, Leopold Parts, Jonas Warringer, Gianni Liti

bioRxiv doi: http://dx.doi.org/10.1101/042176

Explaining trait differences between individuals is a core but challenging aim of life sciences. Here, we introduce a powerful framework for complete decomposition of trait variation into its underlying genetic causes in diploid model organisms. We intercross two natural genomes over many sexual generations, sequence and systematically pair the recombinant gametes into a large array of diploid hybrids with fully assembled and phased genomes, termed Phased Outbred Lines (POLs). We demonstrate the capacity of the framework by partitioning fitness traits of 7310 yeast POLs across many environments, achieving near complete trait heritability (mean H2 = 91%) and precisely estimating additive (74%), dominance (8%), second (9%) and third (1.8%) order epistasis components. We found nonadditive quantitative trait loci (QTLs) to outnumber (3:1) but to be weaker than additive loci; dominant contributions to heterosis to outnumber overdominant (3:1); and pleiotropy to be the rule rather than the exception. The POL approach presented here offers the most complete decomposition of diploid traits to date and can be adapted to most model organisms.

Transposable Element Evolution in the Allotetraploid Capsella bursa-pastoris and the Perfect Storm Hypothesis

Posted on March 4, 2016 by schraib

Transposable Element Evolution in the Allotetraploid Capsella bursa-pastoris and the Perfect Storm Hypothesis

J Arvid Agren, Hui-Run Huang, Stephen I Wright

bioRxiv doi: http://dx.doi.org/10.1101/042325

Premise of the study Shifts in ploidy level will affect the evolutionary dynamics of genomes in a myriad of ways. Population genetic theory predicts that transposable element (TE) proliferation may follow because the genome wide efficacy of selection should be reduced and the increase in gene copies may mask the deleterious effects of TE insertions. However, to date the evidence of TE proliferation following an increase in ploidy is mixed, with some studies reporting results consistent with this scenario and others signs of genome downsizing. Methods We used high-coverage whole genome sequence data to evaluate the abundance, genomic distribution, and population frequencies of TEs in the self-fertilizing recent allotetraploid Capsella bursa-pastoris, a species with prior evidence for genome-wide reductions in selection at the amino acid level since the transition to selfing. We then compared the C. bursa-pastoris TE profile with that of its two parental species, outcrossing C. grandiflora and self-fertilzing C. orientalis. Key results We found no evidence that C. bursa-pastoris has experienced a large proliferation of TEs. Instead, the abundance, both overall and near genes, as well as the population frequencies of TEs, are intermediate to that of its two parental species C. grandiflora and C. orientalis. Conclusions The lack of shift in TE profile beyond additivity expectations in C. bursa-pastoris can be because of variety of factors. In general, we argue that allopolyploid lineages that retain high outcrossing should provide a “perfect storm” for TE proliferation, while highly selfing polyploids may generally experience TE loss.

The roles of LINEs, LTRs and SINEs in lineage-specific gene family expansions in the human and mouse genomes

Posted on March 4, 2016 by schraib

The roles of LINEs, LTRs and SINEs in lineage-specific gene family expansions in the human and mouse genomes

Václav Janoušek, Christina M Laukaitis, Alexey Yanchukov, Robert Karn

bioRxiv doi: http://dx.doi.org/10.1101/042309

We explored genome-wide patterns of RT content surrounding lineage-specific gene family expansions in the human and mouse genomes. Our results suggest that the size of a gene family is an important predictor of the RT distribution in close proximity to the family members. The distribution differs considerably between the three most common RT classes (LINEs, LTRs and SINEs). LINEs and LTRs tend to be more abundant around genes of multi-copy gene families, whereas SINEs tend to be depleted around such genes. Detailed analysis of the distribution and diversity of LINEs and LTRs with respect to gene family size suggests that each has a distinct involvement in gene family expansion. LTRs are associated with open chromatin sites surrounding the gene families, supporting their involvement in gene regulation, whereas LINEs may play a structural role, promoting gene duplication. This suggests that gene family expansions, especially in the mouse genome, might undergo two phases, the first is characterized by elevated deposition of LTRs and their utilization in reshaping gene regulatory networks. The second phase is characterized by rapid gene family expansion due to continuous accumulation of LINEs and it appears that, in some instances at least, this could become a runaway process. We provide an example in which this has happened and we present a simulation supporting the possibility of the runaway process. Our observations also suggest that specific differences exist in this gene family expansion process between human and mouse genomes.

Plant root pathogens over 120,000 years of temperate rainforest ecosystem development

Posted on March 4, 2016 by schraib

Plant root pathogens over 120,000 years of temperate rainforest ecosystem development

Ian A. Dickie, Angela Wakelin, Laura Martinez-Garcia, Sarah J. Richardson, Andreas Makiola, Jason M. Tylianakis

bioRxiv doi: http://dx.doi.org/10.1101/042341

The role of pathogens, including oomycetes, in long-term ecosystem development has remained largely unknown, despite hypotheses that pathogens drive primary succession, determine mature ecosystem plant diversity, or dominate in retrogressive, nutrient-limited ecosystems. Using DNA sequencing from roots, we investigated the frequency and host relationships of oomycete communities along a 120 000 year glacial chronosequence. Oomycetes were frequent in early successional sites (5 – 70 yrs), occurring in 38 – 65% of plant roots, but rare (average 3%) in all older ecosystems (280 yrs and older). Oomycetes were highly host specific, and more frequent on plant species that declined most strongly in abundance between ecosystem ages. In contrast, oomycetes were not correlated with plant abundance or plant root traits associated with retrogression. The results support the importance of root pathogens in early succession, but not thereafter, suggesting root pathogen-driven dynamics may be important in driving succession but not long-term diversity maintenance.

Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences

Posted on March 4, 2016 by schraib

Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences

Alexey M Kozlov, Jiajie Zhang, Pelin Yilmaz, Frank Oliver Glockner, Alexandros Stamatakis

bioRxiv doi: http://dx.doi.org/10.1101/042200

Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labour-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (‘mislabels’) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity / 91.7% precision) as well as correction (94.9% sensitivity / 89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa.

Non-Identifiable Pedigrees and a Bayesian Solution

Posted on March 4, 2016 by schraib

Non-Identifiable Pedigrees and a Bayesian Solution
B. Kirkpatrick

Some methods aim to correct or test for relationships or to reconstruct the pedigree, or family tree. We show that these methods cannot resolve ties for correct relationships due to identifiability of the pedigree likelihood which is the probability of inheriting the data under the pedigree model. This means that no likelihood-based method can produce a correct pedigree inference with high probability. This lack of reliability is critical both for health and forensics applications.
In this paper we present the first discussion of multiple typed individuals in non-isomorphic pedigrees, P and Q, where the likelihoods are non-identifiable, Pr[G | P,θ]=Pr[G | Q,θ], for all input data G and all recombination rate parameters θ. While there were previously known non-identifiable pairs, we give an example having data for multiple individuals.
Additionally, deeper understanding of the general discrete structures driving these non-identifiability examples has been provided, as well as results to guide algorithms that wish to examine only identifiable pedigrees. This paper introduces a general criteria for establishing whether a pair of pedigrees is non-identifiable and two easy-to-compute criteria guaranteeing identifiability. Finally, we suggest a method for dealing with non-identifiable likelihoods: use Bayes rule to obtain the posterior from the likelihood and prior. We propose a prior guaranteeing that the posterior distinguishes all pairs of pedigrees.
Shortened version published as: B. Kirkpatrick. Non-identifiable pedigrees and a Bayesian solution. Int. Symp. on Bioinformatics Res. and Appl. (ISBRA), 7292:139-152 2012.

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Evolution in Eggs and Phases: experimental evolution of fecundity and reproductive timing in Caenorhabditis elegans

Demographic inference under the coalescent in a spatial continuum

Genotyping Allelic and Copy Number Variation in the Immunoglobulin Heavy Chain Locus

The regulator-executor-phenotype architecture shaped by natural selection

Powerful decomposition of complex traits in a diploid model using Phased Outbred Lines

Transposable Element Evolution in the Allotetraploid Capsella bursa-pastoris and the Perfect Storm Hypothesis

The roles of LINEs, LTRs and SINEs in lineage-specific gene family expansions in the human and mouse genomes

Plant root pathogens over 120,000 years of temperate rainforest ecosystem development

Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences

Non-Identifiable Pedigrees and a Bayesian Solution

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: