I downloaded a number of bigWig files from the ENCODE project and converted them to bed files.
I did this as follows:
bigWigToWig file.bigWig file.wig
wig2bed -x <file.wig> file.bed
However the file intervals can differ:
ENCFF001.bed
chr1 2999998 2999999 id-1 1.000000
chr1 2999999 3000000 id-2 1.000000
ENCFF002.bed
chr1 3001400 3001500 id-1 0.140000
chr1 3001600 3001700 id-2 0.140000
My first question is why do they start from different points in the genome? And why do genome-wide bed files always start at ~3000000- why not 1?
And I then downloaded a separate dataset from a source other than ENCODE.
HET.bed
chr1 3049360 3053345 Region_1 0 0
chr1 3136664 3138809 Region_2 0 0
What I would like to do is align the bed files' intervals so that I can analyse them parallel to one another.
The interval distance between rows is arbitrary it can be 100 or 1000. All I really need is to be able to consistently manipulate the files so that the data looks something like this:
ENCF001 ENCF002 HET
chr1 3000000 3000500 1 2 2
chr1 3001000 3001500 1 1 3
#column values are examples and not from real data
So can anyone help me convert a series of bed files to consistent intervals??
Thanks so much for the answer. However I cannot seem to re-assign the occupancy values from the original files to the new ones. So far I have:
And yet the newly formatted HET-5k.bed doesn't have any occupancy values
Unlike the original HET.bed
Thanks again
In your
HET.bed
snippet, there are no elements that overlap withchr1:0-5000
orchr1:5000-10000
. That seems correct to me.If you just want to see the "meat" of the overlaps and not bother where there are no overlaps between windows and HET regions, add
--skip-unmapped
tobedmap
, e.g.:Thanks again but I'm confused. Now HET-5k.bed looks like this:
but no occupancy values... the original looked like this: