Question

bigwig : peak distance from specific genomic region

1

Entering edit mode

10.3 years ago

Nicolas Rosewick 11k

Hi,

I've a bigwig representing chip-seq peaks and I've a bed file containing a bunch of small genomic region (~100b) . How can I intersect the bigwig and the bed file to get for each entry in the bed fil the closest peak from the bigwig (and also the distance from the region in the bed file ). I was thinking to convert the bigwig in bed file and then using closestBed .. what do you think ?

thanks

ChIP-Seq bigwig distance closest bed • 9.8k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Nicolas Rosewick 11k

0

Entering edit mode

The bigWig format typically stores continuous signal data rather than intervals like a BED file does. Are you somehow calling peaks from the bigWig file?

ADD REPLY • link 10.1 years ago by Ryan Dale 5.0k

Ram · Answer 1 · 2014-07-08

You could convert the ChIP-seq peaks to sorted BED and use closest-features to report the nearest upstream and downstream peaks to each of your sorted regions, along with their distances; just add the --dist operand:

closest-features --dist regions.bed peaks.bed > answer.bed

If you want to save a lot of time, you can quickly parallelize the work by adding the --chrom <chromosome> option and using bedextract to get a fast list of chromosomes, using GNU Parallel to farm out the work:

bedextract --list-chr regions.bed \
 | parallel "closest-features --dist --chrom ${} regions.bed peaks.bed > p_${}.bed"

Then zip all the results together with a multiset union:

bedops --everything p_*.bed > answer.bed

If formatting is an issue, add the --delim <delimiter> operand to closest-features, to replace the default delimiter with one of your choice, e.g., \t or similar. This can make processing with awk or other downstream scripts a little quicker.

Ram · Answer 2 · 2014-07-08

EDIT: I quickly wrote this tool. It should fulfill your needs:

$  echo -e "1\t1000\t20000\n3\t100\t200\nUn\t10\t11"  |\
  java -jar dist/biostar105754.jar -B path/to/All_hg19_RS_noprefix.b


#no data found for  Un  10  11
1   1000    1001    0.0 1   1000    20000
3   100 101 0.0 3   100 200

not the "closest" but http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ contains a tool named: bigWigAverageOverBed

bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
   bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
   name - name field from bed, which should be unique
   size - size of bed (sum of exon sizes
   covered - # bases within exons covered by bigWig
   sum - sum of values over all bases covered
   mean0 - average over bases with non-covered bases counting as zeroes
   mean - average over just covered bases