Question

What is the correct way to use GNU parallel with Primer3?

1

Entering edit mode

9.8 years ago

lunchboxwu ▴ 30

Hi

I need to design primers for around 40,000 sequences. After doing this task with Primer3, I found that it took a very long time.

I I tried to accelerate primer3 operation with GNU parallel, but I cannot managed to successfully use GNU parallel to split input file and do multi-thread operation. Somehow primer3 still ran on 1 core only.

My command is as the following:

cat fasta.p3in | parallel --round-robin -j 12 --pipe --recend "=" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

Could anyone tell the correct way to use GNU Parallel along with Primer3? Thanks a lot!

primer3 gnu parallel multithread • 3.7k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by lunchboxwu ▴ 30

1

Entering edit mode

what would typical command line (without parallel) look like? would it be something like

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta_part1.p3in

as a sidenote, have you looked through the parallel guide? Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

ADD REPLY • link 9.8 years ago by Ying W ★ 4.3k

0

Entering edit mode

Hi, Ying W:

Thanks.

I've read through Gnu Parallel tutorial and the post Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them..

The command I used is according to the BLAT example in the biostar post.

The command line (without parallel) of primer3 is:

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta.p3in > fasta.p3out

and the record in *.p3in (primer3 input format) is:

SEQUENCE_ID=1
SEQUENCE_TEMPLATE=ATATGGCGATAGTAAAATTTTGAAAAAAAAAAAGAAAAATTTTAGAAGCAAAATTTTCCGTCATCTTGAATTTTGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=
SEQUENCE_ID=2
SEQUENCE_TEMPLATE=TTAAATTTAACACAAAACTTTTTACCGTGTGGGAAAATTTCTAATAAACAGGATTTATCAGATTTATCAATTGCAAGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=

There's a '=' at the end of each record

Any ideas?

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by lunchboxwu ▴ 30

Ram · Accepted Answer · 2015-03-21

5

Entering edit mode

9.8 years ago

ole.tange ★ 4.5k

Your biggest mistake was probably that your records contain = on every line, but only \n=\n is a record separator. Using the command wc or --files cat is great for debugging that kind of problems.

Your second mistake is that --block-size defaults to 1M: So the first instance may simply gobble up everything.

This ought to work (untested, as I have neither access to fasta.p3in nor to primer3):

cat fasta.p3in | parallel -N1 --round-robin --pipe --recend "\n=\n" --cat /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

You can possibly leave out --cat if primer3 reads from STDIN. If GNU Parallel takes up significant time, increase -N1: With 40000 records it is probably OK to split on bigger chunks than 1 record.

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by ole.tange ★ 4.5k

2

Entering edit mode

Thank you for your help, ole.tange. You are my lifesaver!

You're right, I should use "\n=\n" as delimiter and I also should set record number for parallel.

Finally I managed to run primer3 with parallel. The command line is the following:

cat fasta.p3in | parallel -N10 --round-robin --pipe --recend "\n=\n" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

ADD REPLY • link 9.8 years ago by lunchboxwu ▴ 30