Question

Calculate the area covered more than a certain depth

0

Entering edit mode

5.6 years ago

Floydian_slip ▴ 170

Hi, I have an exome BAM file and a BED file of CCDS. How can I find out what fraction of my CCDS is covered at a depth higher than a threshold? Eg., I want to know what fraction of CCDS is above 100X.

Thanks!

next-gen sequencing bam coverage • 1.5k views

ADD COMMENT • link updated 5.6 years ago by Jianyu ▴ 580 • written 5.6 years ago by Floydian_slip ▴ 170

score 0 · Answer 1 · 2019-10-15

0

Entering edit mode

5.6 years ago

Jianyu ▴ 580

You can sort and index your bam file, then count the number of reads located in each region in bed file.

A simple example:

samtools sort exome.bam -o exome.sort.bam
samtools index exome.sort.bam
old_IFS=$IFS
IFS=$'\n'
for line in `cat CCDS.bed`; do
    region=`echo $line | awk '{printf "%s:%d-%d", $1, $2, $3}'`
    # count the number of reads located in each region
    samtools view exome.sort.bam $region | wc -l
done
IFS=$old_IFS

You can use -@ to use more threads to speed up the iteration

ADD COMMENT • link 5.6 years ago by Jianyu ▴ 580

0

Entering edit mode

Hi yztxwd, this will give me the number of reads in each region. I am looking for number of bases in each region where the coverage is more than a certain depth (say 100X). Can I somehow replace the last 'samtools view' command above to count the bases above 100X? Thanks!

ADD REPLY • link 5.6 years ago by Floydian_slip ▴ 170

0

Entering edit mode

Hi, I think you can just replace the $region with each base in region, like chr1:10000-10000, but iterate each base will dramatically increase the computation time.

May I ask why you want per base coverage? I think maybe a better choice is to use bamCoverage (I have never done it on whole genome, so I am not sure about running time), you can set the bin size to 1 to count per base coverage, then use the base coordinates in your region file to compute the fraction by other tools (like python or R)

ADD REPLY • link 5.6 years ago by Jianyu ▴ 580

0

Entering edit mode

More than the per base coverage, I am looking for the fraction of region in the bed file which is more than 100X to see how the experiment performed. Basically, I want to know how much fraction of the exome is covered with 100X or more. Do you know of a quick way to do that given the Bam file and the exome bed file? Thanks for your help.

ADD REPLY • link 5.5 years ago by Floydian_slip ▴ 170