# The equivalence between weak and strong purifying selection

The equivalence between weak and strong purifying selection
Benjamin H Good, Michael M Desai
(Submitted on 16 Oct 2012)

Weak purifying selection, acting on many linked mutations, may play a major role in shaping patterns of molecular evolution in natural populations. Yet efforts to infer these effects from DNA sequence data are limited by our incomplete understanding of weak selection on local genomic scales. Here, we demonstrate a natural symmetry between weak and strong selection, in which the effects of many weakly selected mutations on patterns of molecular evolution are equivalent to a smaller number of more strongly selected mutations. By introducing a coarse-grained “effective selection coefficient,” we derive an explicit mapping between weakly selected populations and their strongly selected counterparts, which allows us to make accurate and efficient predictions across the full range of selection strengths. This suggests that an effective selection coefficient and effective mutation rate — not an effective population size — is the most accurate summary of the effects of selection over locally linked regions. Moreover, this correspondence places fundamental limits on our ability to resolve the effects of weak selection from contemporary sequence data alone.

# Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution

Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution

Benjamin H. Good, Michael M. Desai
(Submitted on 15 Oct 2012)

Evolutionary dynamics and patterns of molecular evolution are strongly influenced by selection on linked regions of the genome, but our quantitative understanding of these effects remains incomplete. Recent work has focused on predicting the distribution of fitness within an evolving population, and this forms the basis for several methods that leverage the fitness distribution to predict the patterns of genetic diversity when selection is strong. However, in weakly selected populations random fluctuations due to genetic drift are more severe, and neither the distribution of fitness nor the sequence diversity within the population are well understood. Here, we briefly review the motivations behind the fitness-distribution picture, and summarize the general approaches that have been used to analyze this distribution in the strong-selection regime. We then extend these approaches to the case of weak selection, by outlining a perturbative treatment of selection at a large number of linked sites. This allows us to quantify the stochastic behavior of the fitness distribution and yields exact analytical predictions for the sequence diversity and substitution rate in the limit that selection is weak.

# A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape

A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape

Matthew Meriweather, Sara Matthews, Rita Rio, Regina S Baucom
(Submitted on 13 Oct 2012)

Elucidating the spatial dynamic and core constituents of the microbial communities found in association with arthropod hosts is of crucial importance for insects that may vector human or agricultural pathogens. The hematophagous Cimex lectularius, known as the common bed bug, has made a recent resurgence in North America, as well as worldwide, potentially owing to increased travel and resistance to insecticides. A comprehensive survey of the bed bug microbiome has not been performed to date, nor has an assessment of the spatial dynamics of its microbiome. Here we present a survey of bed bug microbial communities by amplifying the V4-V6 hypervariable region of the 16S rDNA gene region followed by 454 Titanium sequencing using 31 individuals from eight natural populations collected from residences in Cincinnati, OH. Across all samples, 97% of the microbial community is made up of two dominant OTUs identified as the $\alpha$-proteobacterium Wolbachia and an unnamed $\gamma$-proteobacterium from the Enterobacteriaceae. Microbial communities varied among host populations for measures of community diversity and exhibited significant population structure. We also uncovered a strong negative correlation in the abundance of the two dominant OTUs, suggesting they may fulfill similar roles as nutritional mutualists. This broad survey represents the most comprehensive assessment, to date, of the microbes that associate with bed bugs, and uncovers evidence for potential antagonism between the two dominant members of the bed bug microbiome.

# Species Identification and Unbiased Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences

Species Identification and Unbiased Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences

Swee Hoe Ong, Vinutha Uppoor Kukkillaya, Andreas Wilm, Christophe Lay, Eliza Xin Pei Ho, Louie Low, Martin Lloyd Hibberd, Niranjan Nagarajan
(Submitted on 12 Oct 2012)

The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to capture more than 90% of sequences in the Greengenes database and with nearly twice the resolution of existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the diversity of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.

# Modeling the Clonal Evolution of Cancer from Next Generation Sequencing Data

Modeling the Clonal Evolution of Cancer from Next Generation Sequencing Data

Wei Jiao, Shankar Vembu, Amit G. Deshwar, Lincoln Stein, Quaid Morris
(Submitted on 11 Oct 2012)

We consider the problem of inferring the clonal evolutionary structure of cancer cells from high-throughput next generation sequencing data. We address this problem using statistical machine learning to infer a relational clustering of objects, where the clusters are connected in the form of a rooted tree. We present a hierarchical Bayesian mixture model that uses a non-parametric prior over trees to automatically estimate the number of clones (clusters) and their clonal frequencies (cluster means) in the population, and to identify the phylogenetic relationship between these subclones. Experiments on three real data sets comprising 12 tumor samples from triple-negative breast cancer, acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate the efficacy of our method.

# Integrative modeling of eQTLs and cis-regulatory elements suggest mechanisms underlying cell type specificity of eQTLs

Integrative modeling of eQTLs and cis-regulatory elements suggest mechanisms underlying cell type specificity of eQTLs
Christopher D Brown, Lara M Mangravite, Barbara E Engelhardt
(Submitted on 11 Oct 2012)

Genetic variants in cis-regulatory elements or trans-acting regulators commonly influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in four parts: first we identified eQTLs from eleven studies on seven cell types; next we quantified cell type specific eQTLs across the studies; then we integrated eQTL data with cis-regulatory element (CRE) data sets from the ENCODE project; finally we built a classifier to predict cell type specific eQTLs. Consistent with prior studies, we demonstrate that allelic heterogeneity is pervasive at cis-eQTLs and that cis-eQTLs are often cell type specific. Within and between cell type eQTL replication is associated with eQTL SNP overlap with hundreds of cell type specific CRE element classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. Using a random forest classifier including 526 CRE data sets as features, we successfully predict the cell type specificity of eQTL SNPs in the absence of gene expression data from the cell type of interest. We anticipate that such integrative, predictive modeling will improve our ability to understand the mechanistic basis of human complex phenotypic variation.

# Identifying and Mapping Cell-type Specific Chromatin Programming of Gene Expression

Identifying and Mapping Cell-type Specific Chromatin Programming of Gene Expression
Troels T. Marstrand, John D. Storey
(Submitted on 11 Oct 2012)

A problem of substantial interest is to systematically map variation in chromatin structure to gene expression regulation across conditions, environments, or differentiated cell types. We developed and applied a quantitative framework for determining the existence, strength, and type of relationship between high-resolution chromatin structure in terms of DNaseI hypersensitivity (DHS) and genome-wide gene expression levels in 20 diverse human cell lines. We show that ~25% of genes show cell-type specific expression explained by alterations in chromatin structure. We find that distal regions of chromatin structure (e.g., +/- 200kb) capture more genes with this relationship than local regions (e.g., +/- 2.5kb), yet the local regions show a more pronounced effect. By exploiting variation across cell-types, we were capable of pinpointing the most likely hypersensitive sites related to cell-type specific expression, which we show have a range of contextual usages. This quantitative framework is likely applicable to other settings aimed at relating continuous genomic measurements to gene expression variation.