Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize

Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize

Jason G Wallace, Peter Bradbury, Nengyi Zhang, Yves Gibon, Mark Stitt, Edward Buckler
doi: http://dx.doi.org/10.1101/010207
AbstractInfo/HistoryMetricsData Supplements Preview PDF
ABSTRACT

Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ~5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ~800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ~50% more likely to have a paralog than expected by chance, indicating that gene regulation and neofunctionalization are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing

RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing

Vikas Gupta, April Dawn Estrada, Ivory Clabaugh Blakley, Rob Reid, Ketan Patel, Mason D. Meyer, Stig Uggerhoj Andersen, Allan F. Brown, Mary Ann Lila, Ann Loraine
doi: http://dx.doi.org/10.1101/010116

Background: Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable breeding berry varieties with enhanced health benefits. Results: Toward this end, we annotated a draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up and down regulation of metabolic pathway enzymes, cell growth-related genes, and putative transcriptional regulators. Analysis of RNA-seq alignments also identified developmentally regulated alternative splicing, promoter use, and 3′ end formation. Conclusions: We report genome sequence, gene models, functional annotations, and RNA-Seq expression data which provide an important new resource enabling high throughput studies in blueberry. RNA-Seq data are freely available for visualization in Integrated Genome Browser, and analysis code is available from the git repository at http://bitbucket.org/lorainelab/blueberrygenome.

Synonymous and Nonsynonymous Distances Help Untangle Convergent Evolution and Recombination

Synonymous and Nonsynonymous Distances Help Untangle Convergent Evolution and Recombination

Peter B. Chi, Sujay Chattopadhyay, Philippe Lemey, Evgeni V. Sokurenko, Vladimir N. Minin
(Submitted on 6 Oct 2014)

When estimating a phylogeny from a multiple sequence alignment, researchers often assume the absence of recombination. However, if recombination is present, then tree estimation and all downstream analyses will be impacted, because different segments of the sequence alignment support different phylogenies. Similarly, convergent selective pressures at the molecular level can also lead to phylogenetic tree incongruence across the sequence alignment. Current methods for detection of phylogenetic incongruence are not equipped to distinguish between these two different mechanisms and assume that the incongruence is a result of recombination or other horizontal transfer of genetic information. We propose a new recombination detection method that can make this distinction, based on synonymous codon substitution distances. Although some power is lost by discarding the information contained in the nonsynonymous substitutions, our new method has lower false positive probabilities than the original Dss statistic when the phylogenetic incongruence signal is due to convergent evolution. We conclude with three empirical examples, where we analyze: 1) sequences from a transmission network of the human immunodeficiency virus, 2) tlpB gene sequences from a geographically diverse set of 38 Helicobacter pylori strains, and 3) Hepatitis C virus sequences sampled longitudinally from one patient.

Fitting the Balding-Nichols model to forensic databases

Fitting the Balding-Nichols model to forensic databases

Rori Rohlfs, Vitor R.C. Aguiar, Kirk E. Lohmueller, Amanda M. Castro, Alessandro C.S. Ferreira, Vanessa C.O. Almeida, Iuri D. Louro, Rasmus Nielsen
doi: http://dx.doi.org/10.1101/009969
AbstractInfo/HistoryMetricsData Supplements Preview PDF
ABSTRACT

Large forensic databases provide an opportunity to compare observed empirical rates of genotype matching with those expected under forensic genetic models. A number of researchers have taken advantage of this opportunity to validate some forensic genetic approaches, particularly to ensure that estimated rates of genotype matching between unrelated individuals are indeed slight overestimates of those observed. However, these studies have also revealed systematic error trends in genotype probability estimates. In this analysis, we investigate these error trends and show how they result from inappropriate implementation of the Balding-Nichols model in the context of database-wide matching. Specifically, we show that in addition to accounting for increased allelic matching between individuals with recent shared ancestry, studies must account for relatively decreased allelic matching between individuals with more ancient shared ancestry.

Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders

Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders

Robert P Brown, Hane Lee, Ascia Eskin, Gleb Kichaev, Kirk E Lohmueller, Bruno Reversade, Stanley F Nelson, Bogdan Pasaniuc
doi: http://dx.doi.org/10.1101/010017

Recent breakthroughs in exome sequencing technology have made possible the identification of many causal variants of monogenic disorders. Although extremely powerful when closely related individuals (e.g. child and parents) are simultaneously sequenced, exome sequencing of individual only cases is often unsuccessful due to the large number of variants that need to be followed-up for functional validation. Many approaches remove from consideration common variants above a given frequency threshold (e.g. 1%), and then prioritize the remaining variants according to their allele frequency, functional, structural and conservation properties. In this work, we present methods that leverage the genetic structure of different populations while accounting for the finite sample size of the reference panels to improve the variant filtering step. Using simulations and real exome data from individuals with monogenic disorders, we show that our methods significantly reduce the number of variants to be followed-up (e.g. a 36% reduction from an average 418 variants per exome when ancestry is ignored to 267 when ancestry is taken into account for case-only sequenced individuals). Most importantly our proposed approaches are well calibrated with respect to the probability of filtering out a true causal variant (i.e. false negative rate, FNR), whereas existing approaches are susceptible to high FNR when reference panel sizes are limited.

Inference of evolutionary forces acting on human biological pathways

Inference of evolutionary forces acting on human biological pathways

Josephine T Daub, Isabelle Dupanloup, Marc Robinson-Rechavi, Laurent Excoffier
doi: http://dx.doi.org/10.1101/009928

Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald-Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects as well as evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their 2D null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection, but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of non-synonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures.

The Drosophila Genome Nexus: a population genomic resource of 605 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population

The Drosophila Genome Nexus: a population genomic resource of 605 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population

Justin Lack, Charis Cardeno, Marc Crepeau, William Taylor, Russ Corbett-Detig, Kristian Stevens, Charles H. Langley, John Pool
doi: http://dx.doi.org/10.1101/009886

Hundreds of wild-derived D. melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach, and settled on an assembly strategy that utilizes two alignment programs and incorporates both SNPs and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets (previous DPGP releases and the DGRP freeze 2.0), and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 605 consistently aligned genomes, and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.

Reticulate speciation and adaptive introgression in the Anopheles gambiae species complex

Reticulate speciation and adaptive introgression in the Anopheles gambiae species complex

Jacob Crawford, Michelle M. Riehle, Wamdaogo M. Guelbeogo, Awa Gneme, N’fale Sagnon, Kenneth D. Vernick, Rasmus Nielsen, Brian P. Lazzaro
doi: http://dx.doi.org/10.1101/009837

Species complexes are common, especially among insect disease vectors, and understanding how barriers to gene flow among these populations become established or violated is critical for implementation of vector-targeting disease control. Anopheles gambiae, the primary vector of human malaria in sub-Saharan Africa, exists as a series of ecologically specialized populations that are phylogenetically nested within a species complex. These populations exhibit varying degrees of reproductive isolation, sometimes recognized as distinct subspecies. We have sequenced 32 complete genomes from field-captured individuals of Anopheles gambiae, Anopheles gambiae M form (recently named A. coluzzii), sister species A. arabiensis, and the recently discovered “GOUNDRY” subgroup of A. gambiae that is highly susceptible to Plasmodium. Amidst a backdrop of strong reproductive isolation and adaptive differentiation, we find evidence for adaptive introgression of autosomal chromosomal regions among species and populations. The X chromosome, however, remains strongly differentiated among all of the subpopulations, pointing to a disproportionately large effect of X chromosome genes in driving speciation among anophelines. Strikingly, we find that autosomal introgression has occurred from contemporary hybridization among A. gambiae and A. arabiensis despite strong divergence (~5× higher than autosomal divergence) and isolation on the X chromosome. We find a large region of the X chromosome that has recently swept to fixation in the GOUNDRY subpopulation, which may be an inversion that serves as a partial barrier to gene flow. We also find that the GOUNDRY population is highly inbred, implying increased philopatry in this population. Our results show that ecological speciation in this species complex results in genomic mosaicism of divergence and adaptive introgression that creates a reticulate gene pool connecting vector populations across the speciation continuum with important implications for malaria control efforts.

Most viewed on Haldane’s Sieve: September 2014

This month set a new record for traffic to Haldane’s Sieve; thanks to everyone for the support. The most viewed posts were:

XWAS: a toolset for genetic data analysis and association studies of the X chromosome

XWAS: a toolset for genetic data analysis and association studies of the X chromosome

Diana Chang, Feng Gao, Alon Keinan
doi: http://dx.doi.org/10.1101/009795

Summary: We present XWAS (chromosome X-Wide Analysis tool-Set)–a toolset specially designed for analysis of the X chromosome in association studies, both on the level of single markers and the level of entire genes. It further offers other X-specific analysis tools, including quality control (QC) procedures for X-linked data. We have applied and tested this software by carrying out several X-wide association studies of autoimmune diseases. Availability and Implementation: The XWAS software package, which includes scripts, the binary executable PLINK/XWAS and all source code is freely available for download from http://keinanlab.cb.bscb.cornell.edu/content/tools-data. PLINK/XWAS is implemented in C++ and other features in shell scripts and Perl. This software package is designed for Linux systems.