Question

Depth of Coverage without Read Overlap

0

Entering edit mode

6.4 years ago

drkennetz ▴ 560

Hi guys, thanks in advance for help.

I currently calculate the on target depth of coverage for a sample by using samtools depth:

samtools depth -a -b bedfile.bed dupmrked_sorted_bamfile.bam > output_bamdepth.txt

This outputs all bases' depth (even those with 0 coverage) within the regions in the bedfile, excluding duplicates.

To get a mean value I just run:

awk '{s+=$3}END{print s/NR}' output_bamdepth.txt

Which sums the depth from the samtools output and divides by the number of covered bases which would give me an average depth.

My issue is that the input reads are paired end reads. By the time we sort and mark duplicates with picard, the final bam file is in single read format, and the read could be forward or reverse. This means that a position may be covered by both the forward and reverse read and therefore may be double counted sometimes, and sometimes it may not.

This is just as much a biological problem as it is a computer problem. We have not tried to optimize for shearing size in comparison to read length which would end up providing us with a sweet spot.

If we have a fragment for sequencing that is 250BP and we do paired end 150 Sequencing, then R1 will hit some of the read, and R2 will hit some of the read and then some of the read will be overlapped by R1 and R2 (from the same read pair). This overlap region will give false double coverage. Does anyone have a way for overcoming this?

Thanks, Dennis

next-gen • 2.5k views

ADD COMMENT • link updated 6.4 years ago by Vitis ★ 2.6k • written 6.4 years ago by drkennetz ▴ 560

0

Entering edit mode

You could merge-overlap the paired end reads (eg with FLASH, BBmerge, ...) and then use the merged reads and process them as single end

ADD REPLY • link 6.4 years ago by lieven.sterck 15k

0

Entering edit mode

I appreciate that, but I am not looking to modify the reads or generate new data, I think this can be done using bam header information but I am not sure just how.

ADD REPLY • link 6.4 years ago by drkennetz ▴ 560

score 1 · Answer 1 · 2018-11-07

1

Entering edit mode

6.4 years ago

Vitis ★ 2.6k

According to this post

Samtools Mpileup And Overlapping Paired-End Reads

"samtools mpileup", after version 1.X, doesn't double-count the overlapping read pairs. I don't know the default behavior of "samtools depth", but maybe a small test could be set up to see.

ADD COMMENT • link 6.4 years ago by Vitis ★ 2.6k

0

Entering edit mode

Thank you for this! It is definitely something I will look into. I'll accept this if I find documentation similar for depth, or figure out how to do the same type of analysis using mpileup.

ADD REPLY • link 6.4 years ago by drkennetz ▴ 560