Here's a way to do things on the command line, which gives quantitative control over the inputs and outputs.
1) Convert your fixed- or variable-step WIG to BED with BEDOPS wig2bed
:
$ wig2bed < signal.wig > signal.bed
2) Fetch chromosomes for your genome of interest using Kent Utilities fetchChromSizes
and create a sorted BED file with BEDOPS sort-bed
:
$ fetchChromSizes hg38 | awk -v OFS="\t" '($1!~/_/){ print $1, "0", $2 }' | sort-bed - > hg38.bed
3) Use BEDOPS bedops
with the --chop
option to generate bins of the desired width.
For example, we can generate disjoint windows that are 100kbases wide:
$ bedops --chop 100000 hg38.bed > hg38.100k.bins.bed
4) Finally, smooth the signal in the BED file with BEDOPS bedmap
, using the signal BED file from step 1, the bins made from the chromosomes acquired from step 3, and some range that collects signal upstream and downstream of the window.
For example, for each bin, we can calculate the mean
signal that lies over the window, as well as 50kbases outside the window:
$ bedmap --echo --mean --range 50000 hg38.100k.bins.bed signal.bed > answer.bed
The answer.bed
file will contain the coordinates of the bin and the average signal over the bin and 50kbases around its sides.
The smaller the window and the wider the range, the smoother the resulting post-processed signal. Like using coarse sandpaper on a fine hardwood, you could sand away too much. You could use a chromosome as sample input to see how tuning the different parameters changes the smoothing.