Entering edit mode
4.7 years ago
Shaurya Jauhari
▴
50
Hi.
I have a BED file that was engendered with the following:
bedtools makewindows -g ../bedtools2/genomes/human.hg19.genome -w 2000 > hg19_2K_bins.bed
The goal is to map reads from a BAM file to the intervals as defined, to visualize the distribution of the counts pan-genome, bin-wise. Now, I am aware of the bamCoverage tool from deeptools, but the incorrigible issue is that it merges adjacent bins if the count number overlaps.
bamCoverage --bam testMe.bam \
-o testMe_2k.bw \
--binSize 2000 \
--normalizeUsing None \
--effectiveGenomeSize 2913022398 \ # hg19 version of Homo sapiens
--outFileFormat bedgraph \
--maxFragmentLength 30
The output I desire is something like:
Chrom Start End Score
chr1 0 2000 34
chr1 2000 4000 46
...
where the values in the last column (Score) are from our BAM file. I have two questions, basically.
- Is there an alternative tool for this or a workaround?
- What if we have a bedgraph/ bw file with scores instead of a BAM?
Please advise. Thanks.
Why is it a problem for you if adjacent bins are merged when they have the same score? Why not just post-process a bedGraph file if that's really an issue?
I wouldn't say it is a problem, but the layout of the output file I desire is such- scores in homogeneous bins. Secondly, a post-process of the bedGraph file is surely doable. I just want to know if there are any tools already that can help achieve that. In R, I am trying looping over all the lines of source(output from bamCoverage) and target(fixed bins) files, but it seems too computationally expensive.
To illustrate my point, would something like this be advisable. The "targetBED" is the file for specific sized genomic regions, while "sourceBED" has heterogeneous regions with a score.