I've a bigwig representing chip-seq peaks and I've a bed file containing a bunch of small genomic region (~100b) . How can I intersect the bigwig and the bed file to get for each entry in the bed fil the closest peak from the bigwig (and also the distance from the region in the bed file ). I was thinking to convert the bigwig in bed file and then using closestBed .. what do you think ?
The bigWig format typically stores continuous signal data rather than intervals like a BED file does. Are you somehow calling peaks from the bigWig file?
You could convert the ChIP-seq peaks to sorted BED and use closest-features to report the nearest upstream and downstream peaks to each of your sorted regions, along with their distances; just add the --dist operand:
If you want to save a lot of time, you can quickly parallelize the work by adding the --chrom <chromosome> option and using bedextract to get a fast list of chromosomes, using GNU Parallel to farm out the work:
Then zip all the results together with a multiset union:
bedops --everything p_*.bed > answer.bed
If formatting is an issue, add the --delim <delimiter> operand to closest-features, to replace the default delimiter with one of your choice, e.g., \t or similar. This can make processing with awk or other downstream scripts a little quicker.
bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
name - name field from bed, which should be unique
size - size of bed (sum of exon sizes
covered - # bases within exons covered by bigWig
sum - sum of values over all bases covered
mean0 - average over bases with non-covered bases counting as zeroes
mean - average over just covered bases
Thanks Pierre I knew already this tool but it's not exactly what I want. It's important for me to know the distance from the closest peak ( and in a perfect world the distance from the closest peak upstream and downstream of the region of interest ).
The bigWig format typically stores continuous signal data rather than intervals like a BED file does. Are you somehow calling peaks from the bigWig file?