I know there is already one or two great threads on using genomecov for coverage calculation. However, I was playing around with this tool and tried it on my vcf files hoping it would save me time instead of calculating the coverage from the original bam files.
I noticed the coverage for my case genomes stopped at 10 (I did not include the -max
flag) with 0.99 in 0 coverage. Is this because it is using AD or DP to calculate the coverage? So if my variant caller, GATK annotated these two numbers after its calculations that may explain the strange coverage?
What concerned me more was the controls which are 1000G genomes that I had aligned to hg19 were showing depth of only 0 and 1.
When I went back to the bam files it looked much better with a nice histogram distribution of coverage up to 1000x requiring me to use -max
.
Should I just stay away from calculating coverage with VCFs and just use the original bams? I wish there was an easy way to get just the average depth for each sample without this much processing (ie genomecov each bam then script averaging their coverage)