Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data

Darren Kessner, Tom Turner, John Novembre
(Submitted on 19 Sep 2012)

DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g. bacterial species comprising a microbiome, or pathogen strains in a blood sample). We present an expectation-maximization (EM) algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different strains within a metagenomics sample. Our method outperforms existing methods based on single- site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool.

5 thoughts on “Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data”

This looks potentially interesting for a metagenomics application I am working on. But, I can’t find the link to the software. The manuscript says it should accessible be via the authors’ websites. Anyone got a link?

Reply ↓

cooplab on September 20, 2012 at 10:34 am said:

Shot John Novembre an email, hopefully there will be a response shortly.

Reply ↓

Sorry about that — arXiv / Haldane’s Sieve moved faster than I did…
Code and executables for OSX and Linux are now posted on github here:
https://github.com/dkessner/harp
I added a minimal README which will help you decide whether the software will be useful to you — I’ll be working on more extensive docs and tutorial over the next week. Feel free to email me with any questions.

Reply ↓

Darren, this is great. I am looking forward to trying it as I think it has a specific application for my project. I’ll get back to you with any feedback. Props for putting it up on arXiv!

Reply ↓

Pingback: All the cool kids are on arXiv and Haldane’s Sieve .. why you should be too

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data

5 thoughts on “Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data”

Leave a comment Cancel reply

Share this:

Related

5 thoughts on “Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data”

Leave a comment Cancel reply