Question

Use Bedtools to extract stranded read depth from BAM files

3

Entering edit mode

8.3 years ago

gtho123 ▴ 260

I am trying to use Bedtools to read-depth counts to generate plots of RNA-Seq read coverage like one sees in genome browsers. I am running into difficulty doing this in a strand specific manner. I know my data is strand-specific and I am informed that the alignment was done using Tophat2 specifying that the library is stranded.

I found that I could do a lot using the coveregeBed function to get most of the way there like so:

coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d > coverage.txt

Because BAM files are quite large I subseted the Bedfile to focus on one gene I am familiar with and was able to produce a plausible graph where the read peaks all matched annotated exons. I then sought to split these counts by their strands by adding the -s and -S flags in sequential coverageBed calls as suggested in the docs like so:

coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -s > coverage_F.txt
coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -S > coverage_R.txt

However when I plotted these they were exactly the same (albeit slightly different in shape from the total coverage) which is implausible as I would have thought that any anti-sense transcription should to be lower than the sense expression and not have exactly the same pattern. These are the graphs:

Read_depth_plots

I then tried a different approach where I first interected my subsetted BED file with the BAM file and the called genomeCoverageBed like so:

genomeCoverageBed -ibam SUBSETED.bam -d -split -strand + > coverage_f_geocov.txt 
genomeCoverageBed -ibam SUBSETED.bam -d -splitenter code here -strand - > coverage_r_geocov.txt

This way I got the same result. Crucially the two strand specific read depth files, while identical, do not add up to the total number of reads when I plot it non strand specifically. If I add the forward and reverse strands together and plot them I get this when:

Overlap plot

What is going on and how can I get strand specific read-depth counts?

Thanks in advance

RNA-Seq Bedtools alignment • 3.9k views

ADD COMMENT • link updated 8.3 years ago by michael.ante ★ 3.9k • written 8.3 years ago by gtho123 ▴ 260

score 0 · Answer 1 · 2016-09-01

Hi,

First I'd check some alignment statistics with RSeQC to find out the strand specificity (infer_experiment.py), the alignment itself (bam_stat.py) and here focus on the proper-paired reads vs. the singletons, and the insert size (inner_distance.py).

After checking, I'd use for the coverageBed either only proper-paired reads or only Read1.

Cheers,

Michael