Escape from crossover interference increases with maternal age

Escape from crossover interference increases with maternal age

Christopher L. Campbell, Nicholas A. Furlotte, Nick Eriksson, David Hinds, Adam Auton
(Submitted on 23 Aug 2014)

Recombination plays a fundamental role in meiosis, ensuring the proper segregation of chromosomes and contributing to genetic diversity by generating novel combinations of alleles. Using data derived from directUtoUconsumer genetic testing, we investigated patterns of recombination in over 4,200 families. Our analysis revealed a number of sex differences in the distribution of recombination. We find the fraction of male events occurring within hotspots to be 4.6% higher than for females. We confirm that the recombination rate increases with maternal age, while hotspot usage decreases, with no such effects observed in males. Finally, we show that the placement of female recombination events becomes increasingly deregulated with maternal age, with an increasing fraction of events appearing to escape crossover interference.

IPED2: Inheritance Path based Pedigree Reconstruction Algorithm for Complicated Pedigrees

IPED2: Inheritance Path based Pedigree Reconstruction Algorithm for Complicated Pedigrees

Dan He, Zhanyong Wang, Laxmi Parida, Eleazar Eskin
(Submitted on 23 Aug 2014)

Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. The problem is known to be NP-hard even for datasets known to only contain siblings. Some recent methods have been developed to accurately and efficiently reconstruct pedigrees. These methods, however, still consider relatively simple pedigrees, for example, they are not able to handle half-sibling situations where a pair of individuals only share one parent. In this work, we propose an efficient method, IPED2, based on our previous work, which specifically targets reconstruction of complicated pedigrees that include half-siblings. We note that the presence of half-siblings makes the reconstruction problem significantly more challenging which is why previous methods exclude the possibility of half-siblings. We proposed a novel model as well as an efficient graph algorithm and experiments show that our algorithm achieves relatively accurate reconstruction. To our knowledge, this is the first method that is able to handle pedigree reconstruction based on genotype data only when half-sibling exists in any generation of the pedigree.

Population split time estimation and X to autosome effective population size differences inferred using physically phased genomes

Population split time estimation and X to autosome effective population size differences inferred using physically phased genomes

Shiya Song, Elzbieta Sliwerska, Jeffrey M Kidd
doi: http://dx.doi.org/10.1101/008367

Haplotype resolved genome sequence information is of growing interest due to its applications in both population genetics and medical genetics. Here, we assess the ability to correctly reconstruct haplotype sequences using fosmid pooled sequencing and apply the sequences to explore historical population relationships. We resolved phased haplotypes of sample NA19240, a trio child from the Yoruba HapMap collection using pools of a total of 521,783 fosmid clones. We phased 93% of heterozygous SNPs into haplotype-resolved blocks, with an N50 size of 318kb. Using trio information from HapMap, we linked adjacent blocks together to form paternal and maternal alleles, producing near-to-complete haplotypes. Comparison with 33 individual fosmids sequenced using capillary sequencing shows that our reconstructed sequence haplotypes have a sequence error rate of 0.005%. Utilizing fosmid-phased haplotypes from a Yoruba, a European and a Gujarati sample, we analyzed population history and inferred population split times. We date the initial split between Yoruba and out of African populations to 90,000-100,000 years ago with substantial gene flow occurring until nearly 50,000 years ago, and obtain congruent results with the autosomes and the X chromosome. We estimate that the initial split between European and Gujarati population occurred around 45,000 years ago and gene flow ended around 28,000 years ago. Analysis of X vs autosome inferred effective population sizes reveals distinct epochs in which the ratio of the effective number of males to females changes. We find a period of female bias during the ancestral human lineage up to 1 million years ago and a short period of male bias in Yoruba lineage from 160-400 thousand years ago. We demonstrate the construction of haplotype sequences of sufficient completeness and accuracy for population genetic analysis. As experimental and analytic methods improve, these approaches will continue to shed new light on the history of populations.

Sources of PCR-induced distortions in high-throughput sequencing datasets

Sources of PCR-induced distortions in high-throughput sequencing datasets

Justus M Kebschull, Anthony M Zador
doi: http://dx.doi.org/10.1101/008375

PCR allows the exponential and sequence specific amplification of DNA, even from minute starting quantities. Today, PCR is at the core of the most successful DNA sequencing technologies and is a fundamental step in preparing DNA samples for high throughput sequencing. Despite its importance, we have little comprehensive understanding of the biases and errors that PCR introduces into pools of DNA molecules. Understanding PCRs imperfections and their impact on the amplification of different sequences in a complex mixture is particularly important for a proper understanding of high-throughput sequencing data. We examined the effects of bias, stochasticity, template switches and polymerase errors introduced during PCR on sequence representation in next-generation sequencing libraries. Using Illumina sequencing results of a pool of diverse PCR amplicons with a defined structure, we searched for signatures of each process. We further developed quantitative models for each process and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. PCR errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results will have particular relevance to single cell sequencing, in which sequences are represented by only one or a few molecules.

Variation of nonsynonymous/synonymous rate ratios at HLA genes over time and phylogenetic context

Variation of nonsynonymous/synonymous rate ratios at HLA genes over time and phylogenetic context

B&aacuterbara D Bitarello, Rodrigo dos Santos Francisco, Diogo Meyer
doi: http://dx.doi.org/10.1101/008342

Many HLA loci show an excess of nonsynonymous (dN) with respect to synonymous (dS) substitutions at codons of the antigen recognition site (ARS), a hallmark of adaptive evolution. However, it remains unclear how these changes are distributed over time and across branches of the HLA phylogeny. In particular, although HLA alleles can be assigned to functionally and phylogenetically defined groups (“lineages”), a test for differences in ω (ω = dN/dS) within and between lineages is lacking. We analysed variation of ω across divergence times and phylogenetic contexts (placement of branches in the phylogeny). We found a significant positive correlation between ω at ARS codons and divergence time, and that branches between lineages have higher ω than those within lineages. The excess of nonsynonymous hanges between lineages attained significance when we used non-ARS codons to account for the fact that, even under purifying selection, ω is inflated for recently diverged alleles. Although less intensely selected, within-lineage variation at ARS codons bears evidence of selection, in the form of higher ω than those of non-ARS codons. Our results show that ω ratios of class I HLA genes vary over time, and are higher in branches connecting alleles from distinct lineages. These results suggest that although within-lineage variation bears evidence of balancing selection, the between-lineage changes have been more intensely selected. Our findings indicate the importance of considering the effect of timescale when analysing ω values over a wide spectrum of divergences, and the value of using additional markers (in our case the tightly linked non-ARS codons) to account for the temporal dynamics of ω.

Dead or just asleep? Variance of microsatellite allele distributions in the human Y-chromosome.

Dead or just asleep? Variance of microsatellite allele distributions in the human Y-chromosome.

Joe Flood
doi: http://dx.doi.org/10.1101/008227

Several different methods confirm that a number of micro-satellites on the human Y-chromosome have allele distributions with different variances in different haplogroups, after adjusting for coalescent times. This can be demonstrated through both heteroscedasticity tests and by poor correlation of the variance vectors in different subclades. The most convincing demonstration however is the complete inactivity of some markers in certain subclades – “microsatellite death”, while they are still active in companion subclades. Many microsatellites have declined in activity as they proceed down through descendant subclades. This appears to confirm the theory of microsatellite life cycles, in which point mutations cause a steady decay in activity. However, the changes are too fast to be caused by point mutations alone, and slippage events may be implicated. The rich microsatellite terrain exposed in our large single-haplotype samples provides new opportunities for genotyping and analysis.

Theoretical Foundations of Equitability and the Maximal Information Coefficient

Theoretical Foundations of Equitability and the Maximal Information Coefficient

Yakir A. Reshef, David N. Reshef, Pardis C. Sabeti, Michael Mitzenmacher
(Submitted on 21 Aug 2014)

The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables (Reshef et al., 2011). MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called {\em equitability}, is important for analyzing high-dimensional data sets.
Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show that equitability is a generalization of power against statistical independence. Second, it allows us to compute and discuss the population value of MIC, which we call MIC_*. In doing so we generalize and strengthen the mathematical results proven in Reshef et al. (2011) and clarify the relationship between MIC and mutual information. Introducing MIC_* also enables us to reason about the properties of MIC more abstractly: for instance, we show that MIC_* is continuous and that there is a sense in which it is a canonical “smoothing” of mutual information. We also prove an alternate, equivalent characterization of MIC_* that we use to state new estimators of it as well as an algorithm for explicitly computing it when the joint probability density function of a pair of random variables is known. Our hope is that this paper provides a richer theoretical foundation for MIC and equitability going forward.
This paper will be accompanied by a forthcoming companion paper that performs extensive empirical analysis and comparison to other methods and discusses the practical aspects of both equitability and the use of MIC and its related statistics.

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness

Matthew P Conomos, Michael B Miller, Timothy A Thornton
doi: http://dx.doi.org/10.1101/008276

Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.

A Consistent Estimator of the Evolutionary Rate

A Consistent Estimator of the Evolutionary Rate

Krzysztof Bartoszek, Serik Sagitov
(Submitted on 21 Aug 2014)

We consider a branching particle system where particles reproduce according to the pure birth Yule process with the birth rate L, conditioned on the observed number of particles to be equal n. Particles are assumed to move independently on the real line according to the Brownian motion with the local variance s2. In this paper we treat n particles as a sample of related species. The spatial Brownian motion of a particle describes the development of a trait value of interest (e.g. log-body-size). We propose an unbiased estimator Rn2 of the evolutionary rate r2=s2/L. The estimator Rn2 is proportional to the sample variance Sn2 computed from n trait values. We find an approximate formula for the standard error of Rn2 based on a neat asymptotic relation for the variance of Sn2.

Emergent speciation by multiple Dobzhansky-Muller incompatibilities

Emergent speciation by multiple Dobzhansky-Muller incompatibilities

Tiago , Kevin E. Bassler, Ricardo B. R. Azevedo
doi: http://dx.doi.org/10.1101/008268

The Dobzhansky-Muller model posits that incompatibilities between alleles at different loci cause speciation. However, it is known that if the alleles involved in a Dobzhansky-Muller incompatibility (DMI) between two loci are neutral, the resulting reproductive isolation cannot be maintained in the presence of either mutation or gene flow. Here we propose that speciation can emerge through the collective effects of multiple neutral DMIs that cannot, individually, cause speciation-a mechanism we call emergent speciation. We investigate emergent speciation using a haploid neutral network model with recombination. We find that certain combinations of multiple neutral DMIs can lead to speciation. Complex DMIs and high recombination rate between the DMI loci facilitate emergent speciation. These conditions are likely to occur in nature. We conclude that the interaction between DMIs may be a root cause of the origin of species.