What is the correct way to use GNU parallel with Primer3?
1
1
Entering edit mode
9.8 years ago
lunchboxwu ▴ 30

Hi

I need to design primers for around 40,000 sequences. After doing this task with Primer3, I found that it took a very long time.

I I tried to accelerate primer3 operation with GNU parallel, but I cannot managed to successfully use GNU parallel to split input file and do multi-thread operation. Somehow primer3 still ran on 1 core only.

My command is as the following:

cat fasta.p3in | parallel --round-robin -j 12 --pipe --recend "=" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

Could anyone tell the correct way to use GNU Parallel along with Primer3? Thanks a lot!

primer3 gnu parallel multithread • 3.7k views
ADD COMMENT
1
Entering edit mode

what would typical command line (without parallel) look like? would it be something like

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta_part1.p3in

as a sidenote, have you looked through the parallel guide? Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

ADD REPLY
0
Entering edit mode

Hi, Ying W:

Thanks.

I've read through Gnu Parallel tutorial and the post Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them..

The command I used is according to the BLAT example in the biostar post.

The command line (without parallel) of primer3 is:

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta.p3in > fasta.p3out

and the record in *.p3in (primer3 input format) is:

SEQUENCE_ID=1
SEQUENCE_TEMPLATE=ATATGGCGATAGTAAAATTTTGAAAAAAAAAAAGAAAAATTTTAGAAGCAAAATTTTCCGTCATCTTGAATTTTGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=
SEQUENCE_ID=2
SEQUENCE_TEMPLATE=TTAAATTTAACACAAAACTTTTTACCGTGTGGGAAAATTTCTAATAAACAGGATTTATCAGATTTATCAATTGCAAGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=

There's a '=' at the end of each record

Any ideas?

ADD REPLY
5
Entering edit mode
9.8 years ago
ole.tange ★ 4.5k

Your biggest mistake was probably that your records contain = on every line, but only \n=\n is a record separator. Using the command wc or --files cat is great for debugging that kind of problems.

Your second mistake is that --block-size defaults to 1M: So the first instance may simply gobble up everything.

This ought to work (untested, as I have neither access to fasta.p3in nor to primer3):

cat fasta.p3in | parallel -N1 --round-robin --pipe --recend "\n=\n" --cat /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

You can possibly leave out --cat if primer3 reads from STDIN. If GNU Parallel takes up significant time, increase -N1: With 40000 records it is probably OK to split on bigger chunks than 1 record.

ADD COMMENT
2
Entering edit mode

Thank you for your help, ole.tange. You are my lifesaver!

You're right, I should use "\n=\n" as delimiter and I also should set record number for parallel.

Finally I managed to run primer3 with parallel. The command line is the following:

cat fasta.p3in | parallel -N10 --round-robin --pipe --recend "\n=\n" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out
ADD REPLY

Login before adding your answer.

Traffic: 2033 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6