Question

HOMER parallel annotation for big .bed file

0

Entering edit mode

6.8 years ago

gamma.jian ▴ 40

Dear all, I'm trying to annotate a huge file with HOMER, since I need information about few millions of sites. I would like to parallelize this process in batches of say 10000 instances of my .bed file. Is there a straight forward way to do so? I tried to get this done with GNU parallel but I really can't figure out if and how I can pass arguments through a pipe to HOMER annotatePeaks.pl command.

annotatePeaks.pl mybig.bed hg19 > output.txt

The idea would be to split the .bed file into N pieces, run a multiple number of jobs (both in parallel and in sequence) and then obtain a unique output with all the annotations from them. It might be trivial but I'm really confused on argument piping in this context. The other option would be to write a bash script to create those pieces as files and only then iterate through them using their names, but I was looking for something more elegant. Thank you in advance

ChIP-Seq • 2.0k views

ADD COMMENT • link updated 6.8 years ago by ole.tange ★ 4.5k • written 6.8 years ago by gamma.jian ▴ 40

score 0 · Answer 1 · 2018-07-05

I solved this using split and then parallel, and then I merged the annotated files again downstream. Note: each file contains the header, which should be removed before merging! I'm sure there are more elegant solutions, but this works!

 split -l 50000 ./../Big_bed.bed
    ls * | parallel -j 10 'annotatePeaks.pl {} hg19 > ./../anno_chunks/{.}_annotated.txt

score 0 · Answer 2 · 2018-07-06

0

Entering edit mode

6.8 years ago

ole.tange ★ 4.5k

Can you test if this works, too:

parallel -a ../Big_bed.bed --pipe-part --block -1 --fifo \
  annotatePeaks.pl {} hg19 > ./../anno_chunks/{.}_annotated.txt

or this (slower):

parallel -a ../Big_bed.bed --pipe-part --block -1 --cat \
  annotatePeaks.pl {} hg19 > ./../anno_chunks/{.}_annotated.txt

ADD COMMENT • link 6.8 years ago by ole.tange ★ 4.5k

0

Entering edit mode

Thank you for your answer and sorry for being late. I tried both solution but I got the same error:

Died at /usr/bin/parallel line 241

I was not able to troubleshoot this...

ADD REPLY • link 6.7 years ago by gamma.jian ▴ 40