Hi,
A simple task.. or should be. I need to extract the average coverage per feature in a bam file. I have a genbank and bed file for the reference the bam was mapped to. if I map with e.g. Geneous I can see good, variable coverage over the reference genome. I have tried GATK (could not get to run) and Bedtools (genomecov and coverage) -coverage
will give me an output file but all the features have zero coverage..
Here's the top of the .bed file:
track name="Example E.coli"
o26chr.gb 189 255 thrL gene 0 +
o26chr.gb 189 255 thrL CDS 0 +
o26chr.gb 336 2799 thrA gene 0 +
o26chr.gb 336 2799 thrA CDS 0 +
o26chr.gb 2800 3733 thrB gene 0 +
o26chr.gb 2800 3733 thrB CDS 0 +
o26chr.gb 3733 5020 thrC gene 0 +
o26chr.gb 3733 5020 thrC CDS 0 +
o26chr.gb 5233 5530 yaaX gene 0 +
Here's the top of the output from bedtools coverage -ibam file.bam -b file.bed
o26chr.gb 1047122 1048841 poxB gene 0 - 0 0 1719 0.0000000
o26chr.gb 1047122 1048841 poxB CDS 0 - 0 0 1719 0.0000000
o26chr.gb 2096828 2097287 gene 0 + 0 0 459 0.0000000
o26chr.gb 3144900 3148635 yfaL gene 0 - 0 0 3735 0.0000000
o26chr.gb 3144900 3148635 yfaL CDS 0 - 0 0 3735 0.0000000
o26chr.gb 4194149 4194368 tdcR gene 0 + 0 0 219 0.0000000
o26chr.gb 4194149 4194368 tdcR CDS 0 + 0 0 219 0.0000000
So all the coverages are zero, but this is not true. Anyone have any idea how to do this before I start writing my own script (again!).
Thanks.
What was wrong with GATK ? Its pretty straight forward command.java -jar GenomeAnalysisTK.jar -T DepthOfCoverage --minBaseQuality 20 -R ref.fasta -I foo.bam -L features.list -o cov.txt
Oops ! Sorry.. This would give per base coverage. Maybe you should try countOverlaps from IRanges package in R.
Have a look at bedcov from the samtools package.
and look for a never version of samtools (I guess version 1.xxx is already out there)