Extract coverage per feature from a bam and bed to a file
1
1
Entering edit mode
10.3 years ago
susan.klein ▴ 80

Hi,

A simple task.. or should be. I need to extract the average coverage per feature in a bam file. I have a genbank and bed file for the reference the bam was mapped to. if I map with e.g. Geneous I can see good, variable coverage over the reference genome. I have tried GATK (could not get to run) and Bedtools (genomecov and coverage) -coverage will give me an output file but all the features have zero coverage..

Here's the top of the .bed file:

track name="Example E.coli"
o26chr.gb    189    255    thrL gene    0    +
o26chr.gb    189    255    thrL CDS    0    +
o26chr.gb    336    2799    thrA gene    0    +
o26chr.gb    336    2799    thrA CDS    0    +
o26chr.gb    2800    3733    thrB gene    0    +
o26chr.gb    2800    3733    thrB CDS    0    +
o26chr.gb    3733    5020    thrC gene    0    +
o26chr.gb    3733    5020    thrC CDS    0    +
o26chr.gb    5233    5530    yaaX gene    0    +

Here's the top of the output from bedtools coverage -ibam file.bam -b file.bed

o26chr.gb    1047122    1048841    poxB gene    0    -    0    0    1719    0.0000000
o26chr.gb    1047122    1048841    poxB CDS    0    -    0    0    1719    0.0000000
o26chr.gb    2096828    2097287    gene    0    +    0    0    459    0.0000000
o26chr.gb    3144900    3148635    yfaL gene    0    -    0    0    3735    0.0000000
o26chr.gb    3144900    3148635    yfaL CDS    0    -    0    0    3735    0.0000000
o26chr.gb    4194149    4194368    tdcR gene    0    +    0    0    219    0.0000000
o26chr.gb    4194149    4194368    tdcR CDS    0    +    0    0    219    0.0000000

So all the coverages are zero, but this is not true. Anyone have any idea how to do this before I start writing my own script (again!).

Thanks.

bedtools coverage bed bam • 6.0k views
ADD COMMENT
0
Entering edit mode

What was wrong with GATK ? Its pretty straight forward command.

java -jar GenomeAnalysisTK.jar -T DepthOfCoverage --minBaseQuality 20 -R ref.fasta -I foo.bam -L features.list -o cov.txt

Oops ! Sorry.. This would give per base coverage. Maybe you should try countOverlaps from IRanges package in R.

ADD REPLY
0
Entering edit mode

Have a look at bedcov from the samtools package.

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.19-44428cd

Usage:   samtools <command> [options]

Command: view        SAM<->BAM conversion
         sort        sort alignment file
         mpileup     multi-way pileup
         depth       compute the depth
         faidx       index/extract FASTA
         tview       text alignment viewer
         index       index alignment
         idxstats    BAM index stats (r595 or later)
         fixmate     fix mate information
         flagstat    simple stats
         calmd       recalculate MD/NM tags and '=' bases
         merge       merge sorted alignments
         rmdup       remove PCR duplicates
         reheader    replace BAM header
         cat         concatenate BAMs
         bedcov      read depth per BED region
         targetcut   cut fosmid regions (for fosmid pool only)
         phase       phase heterozygotes
         bamshuf     shuffle and group alignments by name

and look for a never version of samtools (I guess version 1.xxx is already out there)

ADD REPLY
4
Entering edit mode
10.3 years ago

Note that the chromosome names MUST match between the bed file and the bam file when using any of these tools. I suspect that might be the reason for zero coverage on all features.

ADD COMMENT
2
Entering edit mode

This sounds very plausible, and one of those magic secrets you just never know, unless you went to Hogwarts.. thanks Sean, I'll give it a go.

ADD REPLY
1
Entering edit mode

Yes! Thanks very much, that worked.

ADD REPLY
0
Entering edit mode

My daughter will love to hear that I went to Hogwarts!

ADD REPLY

Login before adding your answer.

Traffic: 2105 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6