I want to get base coverage for specific regions (I'll provide a bed file). However, the output is not in a standard BedGraph format that I can export into R for visualisation.
Q: How do I draw density plot of coverage over the regions from the outputs generated by the coverage tool?
Q: Alternatively, how to convert the non-standard outputs to a proper BedGraph? For example, the Sushi R package takes a bedgraph input file.
The problem is that this tool doesn't take a BED file filtering the regions that I'm interested. Therefore, it'd be too time consuming to calculate for all bases in the genome.
@ Alex: How can I convert my .sam file to standard .bed file? I have tried sam2bed < input.sam > output.bed it gives me output file but the column 5 always contains the same value (which is 99). According to UCSC bed format this is score column and to me this look strange that for every region the score is same. Then by using your mentioned command I obtain the .bedgraph file which I later use for ChromImpute. Problem is that when ChromeImpute convert these raw signal into certain sized bin signal (25 bp) then for every region I get the 0 signal.. I have tried many ways (using Pyisco, make wig files, genomeCoverageBed etc..) but I am unable to solve this problem. Can you help me with this one? Thanks.
To generate bedGraph output from Bedtools coverage command you need to specify the -counts flag, for example:
bedtools coverage -a sample.bed -b sample.bam -counts > sample.bedgraph
Strictly speaking each row of a bedGraph file contains the chromosome name, the start position, the end position and then some value, which in your case would be the coverage, therefore you will want to keep only these fields, so if your had a bed file with six columns, you would want to keep only the first 3 columns (chrom, start, end) and the last column (coverage):
It's not really clear to me from the original question that this is what they want. This would give one value for each genome region. I suspect they want "wiggly" values for each genome region. But could be wrong.
The problem is that this tool doesn't take a BED file filtering the regions that I'm interested. Therefore, it'd be too time consuming to calculate for all bases in the genome.
Fair enough, I didn't catch the subtlety in your question, but James Ashmore did.
I think this might be what you want to do - genomecov and then pull out the regions you want using bedtools intersect. But kind of hard to tell.
Thanks. I think your approach is the best.
Note: -abam is correct.