Entering edit mode
11.3 years ago
stolarek.ir
▴
700
Hi all
I am validating indels, that I call myself against 1000 genome vcf.
I am interested by using my bam file and reference sequence in extracting regions that have 0 coverage.
Is there an easy way to do it?
if read contains deletions, does your tool falls in the same trap as bed tools, which reports deletion as 0 coverage region?
I've added an option USECIGAR to my code: if set, the program will scan the cigar string and detect the deletions in reference. (slower & requires more memory )
cool thanks. It's nice to be an inspiration for writing new tool. A bot of background behind why I am doing this. I am assessing new data type - moleculo in picking up bigger indels (currently I am at 2000 bp of good quality indels). To assess more or less how good moleculo data is I wanted to see how many of indels from 1000 genome vcf file it can pick. First result was moderate, so i went into finding why. Turned out pretty quickly that the culprit is 20% lack of coverage in my genome (plus heterozygous events, for which I need even more coverage). So I filtered out 0 coverage indels positions from 1000 genome vcf file, and then did the correct assessment