Hi all,
We would like to pileup millions of reads from a single amplicon for ultra-sensitive mutation detection.
Considering that SAMtools pileup is limited to several thousand reads at a given position I am wondering if you could suggest us any alternative approach or workaround.
Any feedback is highly appreciated!
If you sequence so deep, how do you make sure that not 99% of your reads are pcr duplicates?
In fact they are PCR duplicates, as the reads derive from amplicons. But that's not a problem, we want to detect single reads out of more than one million having a specific (known) mutation.
"But that's not a problem, ee want to detect single reads out of more than one million having a specific (known) mutation." So you just want to have a look at the cigar string of each read , isn't it ?
That is one additional approach we already thought about. But with millions of reads we also have to think about sequencing errors, both at the mutation and flanking sites. So our mutation may be expressed by different CIGAR strings. Further, e.g. any 4-base mutation at the same position will result in the same CIGAR string. As a third point, there may be the need to detect unknown mutations in a known hotspot in the future as well, that is why we need a flexible approach....
shuffle & downsample your bam ? or are you just searching for the reads having a SNP ?
yes, we want to detect those reads having a mutation
Cross-posted on Samtools mailing list http://sourceforge.net/mailarchive/forum.php?thread_name=20854588711E4A489A3AD70C9BA5548A01AE472348A7%40XCH11.scidom.de&forum_name=samtools-help
and crossposted on http://seqanswers.com/forums/showthread.php?t=41050