Question

Using Gnu Parallel For Bedtools

3

Entering edit mode

11.3 years ago

GouthamAtla 12k

I am trying to run gnu:parallel on bedtools multicov function where the original command is

bedtools multicov -bams bam1 bam2 bam3.. -bed anon.bed  > Q1_Counst.bed

I would like to implement the above command using gnu parallel. But when I run the command below

parallel -j 25 "bedtools multicov -bams {1} -bed {2} > Q1_Counst.bed" ::: minus_1_common_sorted_q1.bam minus_2_common_sorted_q1.bam minus_3_common_sorted_q1.bam plus_1_common_sorted_q1.bam plus_2_common_sorted_q1.bam plus_3_common_sorted_q1.bam ::: '/genome/genes_exon_2.bed'

each bam file is taken as separate argument , hence the processes starting are like

bedtools multicov -bams  bam1 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam2 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam3 -bed anon.bed  > Q1_Counst.bed

instead of taking all files as separate arguments. Hence Q1_Counst.bed is overwritten randomly. Could any one help me in getting exact command ? My server has around 30 cores.

parallel linux bedtools bash • 5.4k views

ADD COMMENT • link updated 11.2 years ago by ole.tange ★ 4.5k • written 11.3 years ago by GouthamAtla 12k

score 3 · Answer 1 · 2014-02-05

3

Entering edit mode

11.3 years ago

Pierre Lindenbaum 166k

split your bed using split

split -l100 anon.bed TMPBED

and then call multiBamCov witch each bed

ls TMPBED* | parallel   multiBamCov -bams f1.bam  f2.bam -bed '{}'  '>' out.{}.bed

ADD COMMENT • link 11.3 years ago by Pierre Lindenbaum 166k

2

Entering edit mode

But it is more like

split -l100 anon.bed TMPBED

for bed in TMPBED*; do multiBamCov -bams f1.bam  f2.bam -bed $bed > $bed_out.bed & done

which create <int TMPBED*> number of sub processes in shell. Is there any other advantage here in running gnu parallel ?

ADD REPLY • link 11.3 years ago by GouthamAtla 12k

1

Entering edit mode

you can limit the number of parallel jobs, you can use a remote server, and then fetch the result back , you can re-analyze only the jobs that failed, ...

ADD REPLY • link 11.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thanks.. It is working. :)

ADD REPLY • link 11.3 years ago by GouthamAtla 12k

score 2 · Answer 2 · 2014-02-06

If you can get multiBamCov to read from stdin, you can avoid the tmp files:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  '>' out.{#}.bed

Or if you just want all output merged into a single file:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  >out.bed

I have never used multiBamCov, so if -bed stdin does not work, you might try:

-bed /dev/stdin
-bed '<( cat )'
-bed -