You could convert the ChIP-seq peaks to sorted BED and use closest-features to report the nearest upstream and downstream peaks to each of your sorted regions, along with their distances; just add the --dist
operand:
closest-features --dist regions.bed peaks.bed > answer.bed
If you want to save a lot of time, you can quickly parallelize the work by adding the --chrom <chromosome>
option and using bedextract to get a fast list of chromosomes, using GNU Parallel to farm out the work:
bedextract --list-chr regions.bed \
| parallel "closest-features --dist --chrom ${} regions.bed peaks.bed > p_${}.bed"
Then zip all the results together with a multiset union:
bedops --everything p_*.bed > answer.bed
If formatting is an issue, add the --delim <delimiter>
operand to closest-features
, to replace the default delimiter with one of your choice, e.g., \t
or similar. This can make processing with awk
or other downstream scripts a little quicker.
The bigWig format typically stores continuous signal data rather than intervals like a BED file does. Are you somehow calling peaks from the bigWig file?