I am trying to use Bedtools to read-depth counts to generate plots of RNA-Seq read coverage like one sees in genome browsers. I am running into difficulty doing this in a strand specific manner. I know my data is strand-specific and I am informed that the alignment was done using Tophat2 specifying that the library is stranded.
I found that I could do a lot using the coveregeBed function to get most of the way there like so:
coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d > coverage.txt
Because BAM files are quite large I subseted the Bedfile to focus on one gene I am familiar with and was able to produce a plausible graph where the read peaks all matched annotated exons. I then sought to split these counts by their strands by adding the -s
and -S
flags in sequential coverageBed
calls as suggested in the docs like so:
coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -s > coverage_F.txt
coverageBed -abam ALIGNMENT.bam -b FILE.bed -split -d -S > coverage_R.txt
However when I plotted these they were exactly the same (albeit slightly different in shape from the total coverage) which is implausible as I would have thought that any anti-sense transcription should to be lower than the sense expression and not have exactly the same pattern. These are the graphs:
I then tried a different approach where I first interected my subsetted BED file with the BAM file and the called genomeCoverageBed
like so:
genomeCoverageBed -ibam SUBSETED.bam -d -split -strand + > coverage_f_geocov.txt
genomeCoverageBed -ibam SUBSETED.bam -d -splitenter code here -strand - > coverage_r_geocov.txt
This way I got the same result. Crucially the two strand specific read depth files, while identical, do not add up to the total number of reads when I plot it non strand specifically. If I add the forward and reverse strands together and plot them I get this when:
What is going on and how can I get strand specific read-depth counts?
Thanks in advance