Hi Everyone
I would nedd some help on genomeCoverageBed tool. This tools when used for finding per base genome coverage uses an option -d.
I am actually interested in finding read counts for each base within a particular intron of a gene.
I will like to explain you more just to make myself clear. I used IGV to see how my alignments looks and moreover what is the coverage of each base within a particular intron. When I take my cursor in IGV to the area exactly above the base (i am interested in)within the coverage track it gives me such details:
Total Count:6
A:0
C:0
G:6
T:0
N:0
Now this total count is basically the read count for the base G within that intron. This counts says that 6 reads have actually covered this base position(and hence base).
Now when i use this code snippet which is basically finding per base genome coverage
genomeCoverageBed -i 2-B3-1b-D303A_sorted.bed -g pombe.genome -d
this code gives me around 31 as the depth for that base(i.e G in my example). Looking closely in IGV i figured out that this 21 is basically 21 = 6 + 15
where 6 is the actual reads that has covered this base position(hence base) and 15 means that these reads have not covered that base at that position, but since the genomeCoverageBed tool calculates depth of feature coverage it also includes all those reads which skips that particular base.
I would provide you with an image to make it more clear
I would like to know how can i get only the read counts (here 6) for that particular base rather than getting the entire feature depth. I ran the above command and it gave me the per base genome coverage output for every base. What else should i do next to extract only the rread counts for that particular base. I can view it in IGV but want it in text format.
Any command line or code will be helpful.
Hope to hear from you soon.
Regards Varun
What do you mean by reads that skipped the particular base? Do you mean reads that have a mismatch at that base? So 6 reads have a matching G, but 15 have mismatches?
Nevermind my last question. I had a brain fart there. You are looking at mate pair reads, so you mean the inter-mate sequences that were skipped.
Hi Well if you look at the graph it says read count as 6 but actually genomeCoverageBed reports it as 21. So that actually means that only 6 reads covered that base (G) whereas other 15 were skipped but are always counted in genomeCoverageBed(thin blue lines).