Calculate the area covered more than a certain depth
1
0
Entering edit mode
5.1 years ago
Floydian_slip ▴ 170

Hi, I have an exome BAM file and a BED file of CCDS. How can I find out what fraction of my CCDS is covered at a depth higher than a threshold? Eg., I want to know what fraction of CCDS is above 100X.

Thanks!

next-gen sequencing bam coverage • 1.3k views
ADD COMMENT
0
Entering edit mode
5.1 years ago
Jianyu ▴ 580

You can sort and index your bam file, then count the number of reads located in each region in bed file.

A simple example:

samtools sort exome.bam -o exome.sort.bam
samtools index exome.sort.bam
old_IFS=$IFS
IFS=$'\n'
for line in `cat CCDS.bed`; do
    region=`echo $line | awk '{printf "%s:%d-%d", $1, $2, $3}'`
    # count the number of reads located in each region
    samtools view exome.sort.bam $region | wc -l
done
IFS=$old_IFS

You can use -@ to use more threads to speed up the iteration

ADD COMMENT
0
Entering edit mode

Hi yztxwd, this will give me the number of reads in each region. I am looking for number of bases in each region where the coverage is more than a certain depth (say 100X). Can I somehow replace the last 'samtools view' command above to count the bases above 100X? Thanks!

ADD REPLY
0
Entering edit mode

Hi, I think you can just replace the $region with each base in region, like chr1:10000-10000, but iterate each base will dramatically increase the computation time.

May I ask why you want per base coverage? I think maybe a better choice is to use bamCoverage (I have never done it on whole genome, so I am not sure about running time), you can set the bin size to 1 to count per base coverage, then use the base coordinates in your region file to compute the fraction by other tools (like python or R)

ADD REPLY
0
Entering edit mode

More than the per base coverage, I am looking for the fraction of region in the bed file which is more than 100X to see how the experiment performed. Basically, I want to know how much fraction of the exome is covered with 100X or more. Do you know of a quick way to do that given the Bam file and the exome bed file? Thanks for your help.

ADD REPLY

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6