Hi all,
We used multiple PCR to prepare the amplicons and loaded on MiSeq to get the sequencing data.
We now want to get the average depth of coverage for targeted amplicon.
However, it seems that if the amplicons have overlapped regions, GATK will merge the interval region first, then report the depth. And GATK can not disable this "merging".
Is there anyway to figure out this accurate depth of coverage, since if we dump the depth from BAM by samtools "depth" command, we will have some regions with overlapped depths.
Best,
Junfeng
Brilliant answer thank you worked perfectly. Couple of points for anyone wanting to use the tool though, I was using it with a bed file generated from hg19 alignments using STAR, Chr 23 = X, Chr 24 = Y. Also make sure your bed coordinates are sorted so that start < end. I had a few alignments which were on the anti-sense strand so the coordinates were the wrong way around.
update 2021: there is now
samtools ampliconclip
samtools-ampliconclip http://www.htslib.org/doc/samtools-ampliconclip.htmlThanks.
However, I think it the two regions are overlapped, this overlapped region will have larger depth of coverage than the others, which is not exactly the real depth of the target region.
Say, the two regions are 200 bp each, of which 100 bp region is overlapped, each of the region is sequenced 100X,
then the depth of the overlapped 100 bp region will be 200X while the other is 100X.
And the original thought is to get the sequenced depth.
I still can not find the best solution to this,
1) mapping to each target region only, discard those mapped reads less than 50 bp? then calculate the depth of this reigon.
2) use samtools to dump the bam to depth of each position, then get the median value as the sequence depth.
However, none of these ideas is the best I considered.
if you have two 200bp regions sequenced at 100x with a 100bp overlap, then you have a 300bp region sequenced at 100x