I have a highly polymorphic (10% polymorphism rate) genome. I expect this variation to be due to heterozygosity. What is the best way to try and phase my haplotypes? My data is reads in fastq format, representing illumina NGS of the entire genome. I aligned it to a reference using bwa.
I found samtools has such a module called phase, and I phased my .bam file. However, I have no idea what to do with it to analyze the ouput. I wanted to be able to extract the build phased haplotypes and to measure their frequencies.
I was never able to use this unfortunately, but I did get some info from the authors:
The algorithm is very simple, but does not work (i.e. produce switching errors) when there are long gaps between markers. It is based on a score based HMM. The hidden states are all possible 15-marker haplotypes. The best phase, in terms of minimal error corrections (the so-called MEC problem), is found by a straightforward dynamic programming. The method is described in a paper published in 2010 (Optimal algorithms for haplotype assembly from whole-genome sequence data) in Bioinformatics, though I found the algorithm independently - it is very simple. The paper also describes an algorithm to eliminate the long-gap limitation, but it is not implemented in samtools.
The method is robust to sequencing and sort of mapping errors. As it is primarily designed for fosmid pool sequencing (Kitzman et al), it is also implemented to correct switching errors due to wrong fosmid identification.
Heng
I have the same problem, I have used linkSNPs before and now i want to use samtools phase module. But I couldn't find any manual about how to use this module .Could you find any answer for your question? How did you analysed the output? I would be grateful if you could share your experience with me.
Thank you
ADD COMMENT
• link
updated 2.9 years ago by
Ram
44k
•
written 10.3 years ago by
moha
•
0