Detecting range expansions from genetic data

Detecting range expansions from genetic data

Benjamin M Peter, Montgomery Slatkin
(Submitted on 29 Mar 2013)

We propose a method that uses genetic data to test for the occurrence of a recent range expansion and to infer the location of the origin of the expansion. We introduce a statistic for pairs of populations \psi (the directionality index) that detects asymmetries in the two-dimensional allele frequency spectrum caused by the series of founder events that happen during an expansion. Such asymmetry arises because low frequency alleles tend to be lost during founder events, thus creating clines in the frequencies of surviving low-frequency alleles. Using simulations, we further show that \psi is more powerful for detecting range expansions than both F_{ST} and clines in heterozygosity. We illustrate the utility of \psi by applying it to a data set from modern humans and show how we can include more complicated scenarios such as multiple expansion origins or barriers to migration in the model.

Most viewed on Haldane’s Sieve, March 2013

The most viewed preprints on Haldane’s Sieve in March 2013 were:

The effects of transcription factor competition on gene regulation

The effects of transcription factor competition on gene regulation

Nicolae Radu Zabet, Boris Adryan
(Submitted on 27 Mar 2013)

We performed stochastic simulations of transcription factor (TF) molecules translocating by facilitated diffusion (a combination of 3D diffusion in the cytoplasm and 1D random walk on the DNA), and consider various abundances of cognate and non-cognate TFs to assess the influence of competitor molecules that also move along the DNA. We show that molecular crowding on the DNA always leads to longer times required by TF molecules to locate their target sites as well as to lower occupancy, which may confer a general mechanism to control gene activity levels globally. Finally, we show that crowding on the DNA may increase transcriptional noise through increased variability of the occupancy time of the target sites.

The influence of transcription factor competition on the relationship between occupancy and affinity

The influence of transcription factor competition on the relationship between occupancy and affinity

Nicolae Radu Zabet, Robert Foy, Boris Adryan
(Submitted on 27 Mar 2013)

Transcription factors (TFs) are proteins that bind to specific sites on the DNA and regulate gene activity. Identifying where TF molecules bind and how much time they spend on their target sites is key for understanding transcriptional regulation. It is usually assumed that the free energy of binding of a TF to the DNA (the affinity of the site) is highly correlated to the amount of time the TF remains bound (the occupancy of the site). However, knowing the binding energy is not sufficient to infer actual binding site occupancy. This mismatch between the occupancy predicted by the affinity and the observed occupancy may be caused by various factors, such as TF abundance, competition between TFs or the arrangement of the sites on the DNA. We investigated the relationship between the affinity of a TF for a set of binding sites and their occupancy. In particular, we considered the case of lac repressor (lacI) in E.coli and performed stochastic simulations of the TF dynamics on the DNA for various combinations of lacI abundance in competition with TFs that contribute to macromolecular crowding. Our results showed that for medium and high affinity sites, TF competition does not play a significant role in genomic occupancy, except in cases when the abundance of lacI is significantly increased or when a low-information content PWM was used. Nevertheless, for medium and low affinity sites, an increase in TF abundance (for both lacI or other molecules) leads to an increase in occupancy at several sites. Keywords: facilitated diffusion, Position Weight Matrix, thermodynamic equilibrium, motif information content, molecular crowding

SICLE: A high-throughput tool for extracting evolutionary relationships from phylogenetic trees

SICLE: A high-throughput tool for extracting evolutionary relationships from phylogenetic trees
Dan DeBlasio, Jennifer Wiscaver
(Submitted on 22 Mar 2013)

We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy to use, adaptable, and high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. With SICLE it is possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. The program is a simple command line utility and is easy to adapt and implement in any phylogenetic pipeline. As a test case, we applied this new tool to published gene phylogenies to identify potential instances of horizontal gene transfer in Salinibacter ruber.

The Convergence of eQTL Mapping, Heritability Estimation and Polygenic Modeling: Emerging Spectrum of Risk Variation in Bipolar Disorder

The Convergence of eQTL Mapping, Heritability Estimation and Polygenic Modeling: Emerging Spectrum of Risk Variation in Bipolar Disorder
Eric R. Gamazon, Hae Kyung Im, Chunyu Liu, Members of the Bipolar Disorder Genome Study (BiGS) Consortium, Dan L. Nicolae, Nancy J. Cox
(Submitted on 25 Mar 2013)

It is widely held that a substantial genetic component underlies Bipolar Disorder (BD) and other neuropsychiatric disease traits. Recent efforts have been aimed at understanding the genetic basis of disease susceptibility, with genome-wide association studies (GWAS) unveiling some promising associations. Nevertheless, the genetic etiology of BD remains elusive with a substantial proportion of the heritability – which has been estimated to be 80% based on twin and family studies – unaccounted for by the specific genetic variants identified by large-scale GWAS. Furthermore, functional understanding of associated loci generally lags discovery. Studies we report here provide considerable support to the claim that substantially more remains to be gained from GWAS on the genetic mechanisms underlying BD susceptibility, and that a large proportion of the variation in disease risk may be uncovered through integrative functional genomic approaches. We combine recent analytic advances in heritability estimation and polygenic modeling and leverage recent technological advances in the generation of -omics data to evaluate the nature and scale of the contribution of functional classes of genetic variation to a relatively intractable disorder. We identified cis eQTLs in cerebellum and parietal cortex that capture more than half of the total heritability attributable to SNPs interrogated through GWAS and showed that eQTL-based heritability estimation is highly tissue-dependent. Our findings show that a much greater resolution may be attained than has been reported thus far on the number of common loci that capture a substantial proportion of the heritability to disease risk and that the functional nature of contributory loci may be clarified en masse.

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
Heng Li
(Submitted on 16 Mar 2013)

Summary: BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads and performs split alignment. The algorithm is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases. For short-read mapping, BWA-MEM shows better performance than several state-of-art read aligners to date.
Availability and implementation: BWA-MEM is implemented as a component of BWA, which is available at this http URL

Major changes in the core developmental pathways of nematodes: Romanomermis culicivorax reveals the derived status of the Caenorhabditis elegans model

Major changes in the core developmental pathways of nematodes: Romanomermis culicivorax reveals the derived status of the Caenorhabditis elegans model
Philipp H. Schiffer, Michael Kroiher, Christopher Kraus, Georgios D. Koutsovoulos, Sujai Kumar, Julia I. R. Camps, Ndifon A. Nsah, Dominik Stappert, Krystalynne Morris, Peter Heger, Janine Altmüller, Peter Frommolt, Peter Nürnberg, W. Kelley Thomas, Mark L. Blaxter, Einhard Schierenberg
(Submitted on 17 Mar 2013)

Background Despite its status as a model organism, the development of Caenorhabditis elegans is not necessarily archetypical for nematodes. The phylum Nematoda is divided into the Chromadorea (indcludes C. elegans) and the Enoplea. Compared to C. elegans, enoplean nematodes have very different patterns of cell division and determination. Embryogenesis of the enoplean Romanomermis culicivorax has been studied in great detail, but the genetic circuitry underpinning development in this species is unknown. Results We created a draft genome of R. culicivorax and compared its developmental gene content with that of two nematodes, C. elegans and Trichinella spiralis (another enoplean), and a representative arthropod Tribolium castaneum. This genome evidence shows that R. culicivorax retains components of the conserved metazoan developmental toolkit lost in C. elegans. T. spiralis has independently lost even more of the toolkit than has C. elegans. However, the C. elegans toolkit is not simply depauperate, as many genes essential for embryogenesis in C. elegans are unique to this lineage, or have only extremely divergent homologues in R. culicivorax and T. spiralis. These data imply fundamental differences in the genetic programmes for early cell specification, inductive interactions, vulva formation and sex determination. Conclusions Thus nematodes, despite their apparent phylum-wide morphological conservatism, have evolved major differences in the molecular logic of their development. R. culicivorax serves as a tractable, contrasting model to C. elegans for understanding how divergent genomic and thus regulatory backgrounds can generate a conserved phenotype. The availability of the draft genome will promote use of R. culicivorax as a research model.

Loss and Recovery of Genetic Diversity in Adapting Populations of HIV

Loss and Recovery of Genetic Diversity in Adapting Populations of HIV
Pleuni Pennings, Sergey Kryazhimskiy, John Wakeley
(Submitted on 15 Mar 2013)

A population’s adaptive potential is the likelihood that it will adapt in response to an environmental challenge, e.g., develop resistance in response to drug treatment. The effective population size inferred from genetic diversity at neutral sites has been traditionally taken as a major predictor of adaptive potential. However recent studies demonstrate that such effective population size vastly underestimates the population’s adaptive potential (Karasov 2010).
Here we use data from treated HIV-infected patients (Bacheler2000) to estimate the effective size of HIV populations relevant for adaptation. Our estimate is based on the frequencies of soft and hard selective sweeps of a known resistance mutation K103N. We observe that 41% of HIV populations in this study acquire resistance via at least two functionally equivalent but distinct mutations which sweep to fixation without significantly reducing genetic diversity at neighboring sites (soft selective sweeps). We further estimate that 20% of populations acquire a resistant allele via a single mutation that sweeps to fixation and drastically reduces genetic diversity (hard selective sweeps). We infer that the effective population size that determines the adaptive potential of within-patient HIV populations is approximately 150,000. Our estimate is two orders of magniture higher than a classical estimate based on diversity at synonymous sites.
Three not mutually exclusive reasons can explain this discrepancy:
(1) some synonymous mutations may be under selection;
(2) highly beneficial mutations may be less affected by ongoing linked selection than synonymous mutations; and
(3) synonymous diversity may not be at its expected equilibrium because it recovers slowly from sweeps and bottlenecks.

Sensitive Long-Indel-Aware Alignment of Sequencing Reads

Sensitive Long-Indel-Aware Alignment of Sequencing Reads
Tobias Marschall, Alexander Schönhuth
(Submitted on 14 Mar 2013)

The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels.