Improving the Efficiency of Genomic Selection in Chinese Simmental beef cattle

Improving the Efficiency of Genomic Selection in Chinese Simmental beef cattle

Jiangwei Xia, Yang Wu, Huizhong Fang, Wengang Zhang, Yuxin Song, Lupei Zhang, Xue Gao, Yan Chen, Junya Li, Huijiang Gao
doi: http://dx.doi.org/10.1101/022673

Genomic selection is an accurate and efficient method of estimating genetic merits by using high-density genome-wide single nucleotide polymorphisms (SNPs).In this study, we investigate an approach to increase the efficiency of genomic prediction by using genome-wide markers. The approach is a feature selection based on genomic best linear unbiased prediction (GBLUP),which is a statistical method used to predict breeding values using SNPs for selection in animal and plant breeding. The objective of this study is the choice of kinship matrix for genomic best linear unbiased prediction (GBLUP).The G-matrix is using the information of genome-wide dense markers. We compare three kinds of kinships based on different combinations of centring and scaling of marker genotypes.And find a suitable kinship approach that adjusts for the resource population of Chinese Simmental beef cattle.Single nucleotide polymorphism (SNPs) can be used to estimate kinship matrix and individual inbreeding coefficients more accurately. So in our research a genomic relationship matrix was developed for 1059 Chinese Simmental beef cattle using 640000 single nucleotide polymorphisms and breeding values were estimated using phenotypes about Carcass weight and Sirloin weight. The number of SNPs needed to accurately estimate a genomic relationship matrix was evaluated in this population. Another aim of this study was to optimize the selection of markers and determine the required number of SNPs for estimation of kinship in the Chinese Simmental beef cattle. We find that the feature selection of GBLUP using Xu’s and the Astle and Balding’s kinships model performed similarly well, and were the best-performing methods in our study. Inbreeding and kinship matrix can be estimated with high accuracy using ≥12,000s in Chinese Simmental beef cattle.

metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

Anna Cichonska, Juho Rousu, Pekka Marttinen, Antti J Kangas, Pasi Soininen, Terho Lehtimäki, Olli T Raitakari, Marjo-Riitta Järvelin, Veikko Salomaa, Mika Ala-Korpela, Samuli Ripatti, Matti Pirinen
doi: http://dx.doi.org/10.1101/022665

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analysing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.

Iron Age and Anglo-Saxon genomes from East England reveal British migration history

Iron Age and Anglo-Saxon genomes from East England reveal British migration history

Stephan Schiffels, Wolfgang Haak, Pirita Paajanen, Bastien Llamas, Elizabeth Popescu, Louise Lou, Rachel Clarke, Alice Lyons, Richard Mortimer, Duncan Sayer, Chris Tyler-Smith, Alan Cooper, Richard Durbin
doi: http://dx.doi.org/10.1101/022723

British population history has been shaped by a series of immigrations and internal movements, including the early Anglo-Saxon migrations following the breakdown of the Roman administration after 410CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences generated from ten ancient individuals found in archaeological excavations close to Cambridge in the East of England, ranging from 2,300 until 1,200 years before present (Iron Age to Anglo-Saxon period). We use present-day genetic data to characterize the relationship of these ancient individuals to contemporary British and other European populations. By analyzing the distribution of shared rare variants across ancient and modern individuals, we find that today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and estimate that the contemporary East English population derives 30% of its ancestry from Anglo-Saxon migrations, with a lower fraction in Wales and Scotland. We gain further insight with a new method, rarecoal, which fits a demographic model to the distribution of shared rare variants across a large number of samples, enabling fine scale analysis of subtle genetic differences and yielding explicit estimates of population sizes and split times. Using rarecoal we find that the ancestors of the Anglo-Saxon samples are closest to modern Danish and Dutch populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain.

CUA: a Flexible and Comprehensive Codon Usage Analyzer

CUA: a Flexible and Comprehensive Codon Usage Analyzer

ZHENGUO ZHANG
doi: http://dx.doi.org/10.1101/022814

Codon usage bias (CUB) is pervasive in genomes. Studying its patterns and causes is fundamental for understanding genome evolution. Rapidly emerging large-scale RNA and DNA sequences make studying CUB in many species feasible. Existing software however is limited in incorporating the new data resources. Therefore, I release the software CUA which can compute all popular CUB metrics, including CAI, tAI, Fop, ENC. More importantly, CUA allows users to incorporate user-specific data, such as tRNA abundance and highly expressed genes from considered tissues; this flexibility enables computing CUB metrics for any species with improved accuracy. In sum, CUA eases codon usage studies and establishes a platform for incorporating new metrics in future. CUA is available at http://search.cpan.org/dist/Bio-CUA/ with help documentation and tutorial.

Replaying Evolution to Test the Cause of Extinction of One Ecotype in an Experimentally Evolved Population

Replaying Evolution to Test the Cause of Extinction of One Ecotype in an Experimentally Evolved Population

Caroline B. Turner, Zachary D. Blount, Richard E. Lenski
doi: http://dx.doi.org/10.1101/022798

In a long-term evolution experiment with Escherichia coli, bacteria in one of twelve populations evolved the ability to consume citrate, a previously unexploited resource in a glucose-limited medium. This innovation led to the frequency-dependent coexistence of citrate-consuming (Cit+) and non-consuming (Cit–) ecotypes, with Cit– bacteria persisting on the exogenously supplied glucose as well as other carbon molecules released by the Cit+ bacteria. After more than 10,000 generations of coexistence, however, the Cit– lineage went extinct; cells with the Cit– phenotype dropped to levels below detection, and the Cit– clade could not be detected by molecular assays based on its unique genotype. We hypothesized that this extinction event was a deterministic outcome of evolutionary change within the population, specifically the appearance of a more-fit Cit+ ecotype that competitively excluded the Cit– ecotype. We tested this hypothesis by re-evolving the population from one frozen sample taken just prior to the extinction and from another sample taken several thousand generations earlier, in each case for 500 generations and with 20-fold replication. To our surprise, the Cit– type did not go extinct in any of these replays, and Cit– cells also persisted in a single replicate that was propagated for 3,000 generations. Even more unexpectedly, we showed that the Cit– ecotype could reinvade the Cit+ population after its extinction. Taken together, these results indicate that the extinction of the Cit– ecotype was not a deterministic outcome driven by competitive exclusion by the Cit+ ecotype. The extinction also cannot be explained by demographic stochasticity, as the population size of the Cit– ecotype should have been many thousands of cells even during the daily transfer events. Instead, we infer that the extinction must have been caused by a rare chance event in which some aspect of the experimental conditions was inadvertently perturbed.

Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models

Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models
Feng Gao, Alon Keinan
doi: http://dx.doi.org/10.1101/022574

The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetics studies. Previous studies have shown that human populations had undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave unexplained excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that decent sample sizes facilitate accurate inference, e.g. a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by 10% or more from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (p-value = 3.85 × 10-6). The estimated growth speed significantly deviates from exponential (p-value << 10-12), with the best-fit estimate being of growth speed 12% faster than exponential.

Monoallelic methylation and allele specific expression in a social insect

Monoallelic methylation and allele specific expression in a social insect
Kate D Lee, Zoe N Lonsdale, Maria Kyriakidou, Despina Nathanael, Harindra E Amarasinghe, Eamonn B Mallon
doi: http://dx.doi.org/10.1101/022657

Abstract

Social insects are emerging models for epigenetics. Here we examine the link between monoallelic methylation and monoallelic expression in the bumblebee \textit{Bombus terrestris} using whole methylome and transcriptome analysis. We found nineteen genes displaying monoallelic methylation and expression. They were enriched for functions to do with social organisation in the social insects. These are the biological processes predicted to involve imprinting by evolutionary theory.

Investigating the Evolutionary Importance of Denisovan Introgressions in Papua New Guineans and Australians

Investigating the Evolutionary Importance of Denisovan Introgressions in Papua New Guineans and Australians
Ya Hu, Qiliang Ding, Yi Wang, Shuhua Xu, Yungang He, Minxian Wang, Jiucun Wang, Li Jin
doi: http://dx.doi.org/10.1101/022632

Previous research reported that Papua New Guineans (PNG) and Australians contain introgressions from Denisovans. Here we present a genome-wide analysis of Denisovan introgressions in PNG and Australians. We firstly developed a two-phase method to detect Denisovan introgressions from whole-genome sequencing data. This method has relatively high detection power (79.74%) and low false positive rate (2.44%) based on simulations. Using this method, we identified 1.34 Gb of Denisovan introgressions from sixteen PNG and four Australian genomes, in which we identified 38,877 Denisovan introgressive alleles (DIAs). We found that 78 Denisovan introgressions were under positive selection. Genes located in the 78 introgressions are related to evolutionarily important functions, such as spermatogenesis, fertilization, cold acclimation, circadian rhythm, development of brain, neural tube, face, and olfactory pit, immunity, etc. We also found that 121 DIAs are missense. Genes harboring the 121 missense DIAs are also related to evolutionarily important functions, such as female pregnancy, development of face, lung, heart, skin, nervous system, and male gonad, visual and smell perception, response to heat, pain, hypoxia, and UV, lipid transport, metabolism, blood coagulation, wound healing, aging, etc. Taken together, this study suggests that Denisovan introgressions in PNG and Australians are evolutionarily important, and may help PNG and Australians in local adaptation. In this study, we also proposed a method that could efficiently identify archaic hominin introgressions in modern non-African genomes.

The distribution and impact of common copy-number variation in the genome of the domesticated apple, Malus x domestica Borkh.

The distribution and impact of common copy-number variation in the genome of the domesticated apple, Malus x domestica Borkh.
James Boocock, David David Chagné, Tony R Merriman, Mik Black
doi: http://dx.doi.org/10.1101/021857

Background Copy number variation (CNV) is a common feature of eukaryotic genomes, and a growing body of evidence suggests that genes affected by CNV are enriched in processes that are associated with environmental responses. Here we use next generation sequence (NGS) data to detect copy-number variable regions (CNVRs) within the Malus x domestica genome, as well as to examine their distribution and impact. Methods CNVRs were detected using NGS data derived from 30 accessions of M. x domestica analysed using the read-depth method, as implemented in the CNVrd2 software. To improve the reliability of our results, we developed a quality control and analysis procedure that involved checking for organelle DNA, not repeat masking, and the determination of CNVR identity using a permutation testing procedure. Results Overall, we identified 876 CNVRs, which spanned 3.5% of the apple genome. To verify that detected CNVRs were not artefacts, we analysed the B- allele-frequencies (BAF) within a SNP array dataset derived from a screening of 185 individual apple accessions and found the CNVRs were enriched for SNPs having aberrant BAFs (P < 1e-13, Fisher’s Exact test). Putative CNVRs overlapped 845 gene models and were enriched for resistance (R) genes (P < 1e-22, Fisher’s exact test). Of note is a cluster of resistance genes on chromosome 2 near a region containing multiple major gene loci conferring resistance to apple scab. Conclusion We present the first analysis and catalogue of CNVRs in the M. x domestica genome. The enrichment of the CNVRs with R genes and their overlap with gene loci of agricultural significance draw attention to a form of unexplored genetic variation in apple. This research will underpin further investigation of the role that CNV plays within the apple genome.

Accelerating Scientific Publication in Biology

Accelerating Scientific Publication in Biology
Ronald D Vale
doi: http://dx.doi.org/10.1101/022368

Scientific publications enable results and ideas to be transmitted throughout the scientific community. The number and type of journal publications also have become the primary criteria used in evaluating career advancement. Our analysis suggests that publication practices have changed considerably in the life sciences over the past thirty years. Considerably more experimental data is now required for publication, and the average time required for graduate students to publish their first paper has increased and is approaching the desirable duration of Ph.D. training. Since publication is generally a requirement for career progression, schemes to reduce the time of graduate student and postdoctoral training may be difficult to implement without also considering new mechanisms for accelerating communication of their work. The increasing time to publication also delays potential catalytic effects that ensue when many scientists have access to new information. The time has come for the life scientists, funding agencies, and publishers to discuss how to communicate new findings in a way that best serves the interests of the public and scientific community.