# Mitochondrial Genomes of Domestic Animals Need Scrutiny

Mitochondrial Genomes of Domestic Animals Need Scrutiny

Ni-Ni Shi, Long Fan, Yong-Gang Yao, Min-Sheng Peng, Ya-Ping Zhang
(Submitted on 16 Jul 2014)

More than 1000 complete or near-complete mitochondrial DNA (mtDNA) sequences have been deposited in GenBank for eight common domestic animals (i.e. cattle, dog, goat, horse, pig, sheep, yak and chicken) and their close wild ancestors or relatives. Nevertheless, few efforts have been performed to evaluate the sequence data quality, which heavily impact the original conclusion. Herein, we conducted a phylogenetic survey of these complete or near-complete mtDNA sequences based on mtDNA haplogroup trees for the eight animals. We show that, errors due to artificial recombination, surplus of mutations, and phantom mutations, do exist in 14.5% (194/1342) of mtDNA sequences and shall be treated with wide caution. We propose some caveats for mtDNA studies of domestic animals in the future.

# THE GENETIC LANDSCAPE OF TRANSCRIPTIONAL NETWORKS IN A COMBINED HAPLOID/DIPLOID PLANT SYSTEM

THE GENETIC LANDSCAPE OF TRANSCRIPTIONAL NETWORKS IN A COMBINED HAPLOID/DIPLOID PLANT SYSTEM
Jukka-Pekka Verta, Christian R Landry, John J MacKay

Heritable variation in gene expression is a source of evolutionary change and our understanding of the genetic basis of expression variation remains incomplete. Here, we dissected the genetic basis of transcriptional variation in a wild, outbreeding gymnosperm (Picea glauca) according to linked and unlinked genetic variants, their allele-specific (cis) and allele non-specific (trans) effects, and their phenotypic additivity. We used a novel plant system that is based on the analysis of segregating alleles of a single self-fertilized plant in haploid and diploid seed tissues. We measured transcript abundance and identified transcribed SNPs in 66 seeds with RNA-seq. Linked and unlinked genetic effects that influenced expression levels were abundant in the haploid megagametophyte tissue, influencing 48% and 38% of analyzed genes, respectively. Analysis of these effects in diploid embryos revealed that while distant effects were acting in trans consistent with their hypothesized diffusible nature, local effects were associated with a complex mix of cis, trans and compensatory effects. Most cis effects were additive irrespective of their effect sizes, consistent with a hypothesis that they represent rate-limiting factors in transcript accumulation. We show that trans effects fulfilled a key prediction of Wright?s physiological theory, in which variants with small effects tend to be additive and those with large effects tend to be dominant/recessive. Our haploid/diploid approach allows a comprehensive genetic dissection of expression variation and can be applied to a large number of wild plant species.

# Clades and clans: a comparison study of two evolutionary models

Clades and clans: a comparison study of two evolutionary models
Sha Zhu, Cuong Than, Taoyang Wu
Subjects: Populations and Evolution (q-bio.PE)

The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model are two binary tree generating models that are widely used in evolutionary biology. Understanding the distributions of clade sizes under these two models provides valuable insights into macro-evolutionary processes, and is important in hypothesis testing and Bayesian analyses in phylogenetics. Here we show that these distributions are log-convex, which implies that very large clades or very small clades are more likely to occur under these two models. Moreover, we prove that there exists a critical value $\kappa(n)$ for each $n\geqslant 4$ such that for a given clade with size $k$, the probability that this clade is contained in a random tree with $n$ leaves generated under the YHK model is higher than that under the PDA model if $1<k<\kappa(n)$, and lower if $\kappa(n)<k<n$. Finally, we extend our results to binary unrooted trees, and obtain similar results for the distributions of clan sizes.

# Analysis of Algorithms for Determining and Quantifying Full-length mRNA Splice Forms from RNA-Seq Data

Benchmark Analysis of Algorithms for Determining and Quantifying Full-length mRNA Splice Forms from RNA-Seq Data
Katharina Hayer, Angel Pizzaro, Nicholas L Lahens, John B Hogenesch, Gregory R Grant

The advantages of RNA sequencing (RNA-Seq) suggest it will replace microarrays for highly parallel gene expression analysis. For example, in contrast to arrays, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length transcripts. A number of methods have been developed for this purpose, but short error prone reads makes it a difficult problem in practice. It is essential to determine which algorithms perform best, and where and why they fail. However, there is a dearth of independent and unbiased benchmarking studies of these algorithms. Here we take an approach using both simulated and experimental benchmark data to evaluate their accuracy. We conclude that most methods are inaccurate even using idealized data, and that no is method sufficiently accurate once complicating factors such as polymorphisms, intron signal, sequencing error, and multiple splice forms are present. These results point to the pressing need for further algorithm development.

# Stress-Induced Mutagenesis and Complex Adaptation

(Submitted on 14 Jul 2014)

# Bayesian Structured Sparsity from Gaussian Fields

Bayesian Structured Sparsity from Gaussian Fields

Barbara E. Engelhardt, Ryan P. Adams
(Submitted on 8 Jul 2014)

Substantial research on structured sparsity has contributed to analysis of many different applications. However, there have been few Bayesian procedures among this work. Here, we develop a Bayesian model for structured sparsity that uses a Gaussian process (GP) to share parameters of the sparsity-inducing prior in proportion to feature similarity as defined by an arbitrary positive definite kernel. For linear regression, this sparsity-inducing prior on regression coefficients is a relaxation of the canonical spike-and-slab prior that flattens the mixture model into a scale mixture of normals. This prior retains the explicit posterior probability on inclusion parameters—now with GP probit prior distributions—but enables tractable computation via elliptical slice sampling for the latent Gaussian field. We motivate development of this prior using the genomic application of association mapping, or identifying genetic variants associated with a continuous trait. Our Bayesian structured sparsity model produced sparse results with substantially improved sensitivity and precision relative to comparable methods. Through simulations, we show that three properties are key to this improvement: i) modeling structure in the covariates, ii) significance testing using the posterior probabilities of inclusion, and iii) model averaging. We present results from applying this model to a large genomic dataset to demonstrate computational tractability.

# On the number of ranked species trees producing anomalous ranked gene trees

On the number of ranked species trees producing anomalous ranked gene trees
Filippo Disanto, Noah A. Rosenberg
Subjects: Populations and Evolution (q-bio.PE)

Analysis of probability distributions conditional on species trees has demonstrated the existence of anomalous ranked gene trees (ARGTs), ranked gene trees that are more probable than the ranked gene tree that accords with the ranked species tree. Here, to improve the characterization of ARGTs, we study enumerative and probabilistic properties of two classes of ranked labeled species trees, focusing on the presence or avoidance of certain subtree patterns associated with the production of ARGTs. We provide exact enumerations and asymptotic estimates for cardinalities of these sets of trees, showing that as the number of species increases without bound, the fraction of all ranked labeled species trees that are ARGT-producing approaches 1. This result extends beyond earlier existence results to provide a probabilistic claim about the frequency of ARGTs.