Hello, I would be really thankful if someone can help me.
I am trying to calculate coverage for my paired end reads. I want to calculate percent coverage per gene , per exon and per exome. i want it like following.( just an example)
**coverage** **percentage of analysable target bases with**
at least 1 read 98
at least 5 reads 97
at least 50 reads 96
and so one
I have used the samtools depth function samtools depth -a inputfile.bam -o outputfile
it is giving me a file very big (32GB and more) in the following form
chr1 1 0
chr1 2 0
chr1 3 0
chr1 4 0
chr1 5 0
chr1 6 0
chr1 7 0
chr1 8 0
chr1 9 0
chr1 10 0
chr1 11 0
chr1 12 0
chr1 13 0
chr1 14 0
chr1 15 0
chr1 16 0
chr1 17 0
chr1 18 0
chr1 19 0
chr1 20 0
chr1 21 0
chr1 22 0
chr1 23 0
chr1 24 0
chr1 25 0
chr1 26 0
chr1 27 0
chr1 28 0
I also tried GATK but couldn't use it as its still in its beta stage. looking forward for some guidelines.
Regards, sabeen
Did you go through these: ?
How to calculate read depth and coverage for whole exome sequencing?
Calculating Exome Coverage
Calculate Per Exon/Per Gene Coverage
The samtools output is normal and expected, it is the per-base coverage for every single nucleotide in the genome.
Thanks everyone for the help. Unfortunately i am still not able to solve my problems. I tried the samtools coverage as well as bedtools geneomecov. following is the code i used and the screen shot of some lines of result. looks like samtools coverage is giving me coverage per chromosome. however i need it per exon or per gene or per genome. in 1X , 2X .....100X form.
samtools coverage input.bam
I also tried samtools bedcov following is the result chromosome wise . and not percentage coverage. samtools bedcov region.bed input.bam
Also i used bedtools genomecov
(inplace of .genome i used hg19.fa. plz guide how to get .genome file)
i also tried samtools stats and samtools depth. however not getting desire result form.
i would be really thankfull if anyone ca guide me further. i am really stuck at this point.
thanks in advance
reagards, sabeen
When using bedtools genomecov for your purposes, the genome file is not needed. This simple command will give you the sample coverage in bedgraph format, which you could later intersect with your features of interest:
Again, considering your initial question, I must insist in using mosdepth, as it is indeed intended to answer queries like the one you have in a very efficient manner.
Thanks every one for the help. finally able to get the result with all the help here.