With BEDOPS bedops
and bedmap
set and mean operations, and Kent utilities to build 1kb windows:
$ tr -d ',' < elements_with_commas.bed | sort-bed - > elements.bed
$ fetchChromSizes hg38 | grep -v '_*_' | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - | bedops --chop 1000 - > hg38.1kb.bed
$ bedops --everything elements.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, ".", $4 }' > union.bed
$ bedmap --echo --mean --delim '\t' hg38.1kb.bed union.bed > answer.bed
Replace hg38
with your reference genome name (e.g., mm10
for mouse, etc.).
You can also use process substitutions to avoid creating intermediate files, bundling the set and mean operations into a more efficient one-liner:
$ tr -d ',' < elements_with_commas.bed | sort-bed - > elements.bed
$ bedmap --echo --mean --delim '\t' <(fetchChromSizes hg38 | grep -v '_*_' | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - | bedops --chop 1000 -) <(bedops --everything elements.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, ".", $4 }') > answer.bed
However, the first set of commands is probably easier to read, troubleshoot, and modify.
Another demonstration (and related Biostars answer) here, showing use of the --sum
operation, instead of --mean
, though it is the same principle: How can I bin my bed files into 500bp bins?
The bedmap
command offers several score summary operations, in addition to --sum
and --mean
. Median, min, max, weighted and trimmed means, etc. Run bedmap --help
or review the documentation for a full listing.
It worked. Thank you!!!!!!