I would like to make primer file in fasta format by "greping" some information (Primer template, Primer ID, and left & right primers sequence) from primer3 output as below. Goutham Atla graciously gave me a command (grep -E "PRIMER_RIGHT_0_SEQUENCE|PRIMER_LEFT_0_SEQUENCE|SEQUENCE_ID" test.fasta | paste - - - | awk '{ gsub("\047|,","",$0); print ">"$6"-left\n"$2"\n" ">"$6"-right\n"$4}') , and used it well. This time I would like to design 3 pairs of candidate for each target, and modified his command line like this (grep -E "PRIMER_RIGHT_\d_SEQUENCE|PRIMER_LEFT_\d_SEQUENCE|SEQUENCE_ID|SEQUENCE_TEMPLATE" x.out | paste - - - - - - - - | awk '{ gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12}' > xgrep_3primers.out). Of course, it's not working with errors, and I have no clue. Please help me out.
Below is the example of source primer3 output...
{'PRIMER_INTERNAL_NUM_RETURNED': 0L,
...
'PRIMER_LEFT_0_SEQUENCE': 'ATGGCAAATACACAGAGGAAGC',
...
'PRIMER_LEFT_1_SEQUENCE': 'GCAAATACACAGAGGAAGCCTT',
...
'PRIMER_LEFT_2_SEQUENCE': 'TGATGGCAAATACACAGAGGAAG',
...
'PRIMER_RIGHT_0_SEQUENCE': 'AGATGGTGAAACCTGTTTGTTG',
...
'PRIMER_RIGHT_1_SEQUENCE': 'AGATGGTGAAACCTGTTTGTTG',
...
'PRIMER_RIGHT_2_SEQUENCE': 'AGATGGTGAAACCTGTTTGTTG',
...
'SEQUENCE_ID': 'chr1:114713809-114714010',
...
'SEQUENCE_TEMPLATE': 'TAATATCCGCAAATGACTTGCTATTATTGATGGCAAATACACAGAGGAAGCCTTCGCCTGTCCTCATGTATTGGTCTCTCATGGCACTGTACTCTTCTTGTCCAGCTGTATCCAGTATGTCCAACAAACAGGTTTCACCATCTATAACCACTTGTTTTCTGTAAGAATCCTGGGGGTGTggagggtaagggggcagggagg'}
None
Desired output format is:
>chr1:114713809-114714010
TAATATCCGCAAATGACTTGCTATTATTGATGGCAAATACACAGAGGAAGCCTTCGCCTGTCCTCATGTATTGGTCTCTCATGGCACTGTACTCTTCTTGTCCAGCTGTATCCAGTATGTCCAACAAACAGGTTTCACCATCTATAACCACTTGTTTTCTGTAAGAATCCTGGGGGTGTggagggtaagggggcagggagg
>chr1:114713809-114714010-L0
ATGGCAAATACACAGAGGAAGC
>chr1:114713809-114714010-L1
GCAAATACACAGAGGAAGCCTT
>chr1:114713809-114714010-L2
TGATGGCAAATACACAGAGGAAG
>chr1:114713809-114714010-R0
AGATGGTGAAACCTGTTTGTTG
>chr1:114713809-114714010-R1
AGATGGTGAAACCTGTTTGTTG
>chr1:114713809-114714010-R2
AGATGGTGAAACCTGTTTGTTG
First my doubt is why are using \d instead of 0.
I would like to use regular expression to cover all (Left0, Left1, Left2 etc) the primers. Tried with individual numbers and ends up same error as below:
sp@sp-ThinkPad-X220:~/multiplex_primer_design/2016_03_18_DC_Hot$ grep -E "PRIMER_LEFT_0_SEQUENCE|PRIMER_LEFT_1_SEQUENCE|PRIMER_LEFT_2_SEQUENCE|PRIMER_RIGHT_0_SEQUENCE|PRIMER_RIGHT_1_SEQUENCE|PRIMER_RIGHT_2_SEQUENCE|SEQUENCE_ID|SEQUENCE_TEMPLATE" x.out | paste - - - - - - - - | awk '{ gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12}' > xgrep_3primers.out awk: cmd. line:1: { gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12} awk: cmd. line:1: ^ backslash not last character on line awk: cmd. line:1: { gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12} awk: cmd. line:1: ^ syntax error