I am currently trying to create a primer designing program in python predicated on primer3-py.
I currently have a dictionary that contains all the primer designs and metadata as values and their arbitrary sequence name as the key. For example:
{'seq1': {'PRIMER_LEFT_EXPLAIN': 'considered 1483, GC content failed 178, low tm 998, high tm 125, high hairpin stability 2, ok 167',
'PRIMER_RIGHT_EXPLAIN': 'considered 1493, GC content failed 141, low tm 1174, high tm 38, ok 128',
'PRIMER_PAIR_EXPLAIN': 'considered 148, unacceptable product size 141, ok 7',
'PRIMER_LEFT_NUM_RETURNED': 5,
'PRIMER_RIGHT_NUM_RETURNED': 5,
'PRIMER_INTERNAL_NUM_RETURNED': 0,
'PRIMER_PAIR_NUM_RETURNED': 5,
…}
'seq2': {'PRIMER_LEFT_EXPLAIN': 'considered 1483, GC content failed 178, low tm 998, high tm 125, high hairpin stability 2, ok 167',
'PRIMER_RIGHT_EXPLAIN': 'considered 1493, GC content failed 141, low tm 1174, high tm 38, ok 128',
'PRIMER_PAIR_EXPLAIN': 'considered 148, unacceptable product size 141, ok 7',
'PRIMER_LEFT_NUM_RETURNED': 5,
'PRIMER_RIGHT_NUM_RETURNED': 5,
'PRIMER_INTERNAL_NUM_RETURNED': 0,
'PRIMER_PAIR_NUM_RETURNED': 5,
…}
}
(The above dict has been shortened for brevity’s sake)
Is it possible to ‘grep’ all values from the nested dictionary based on the pattern: 'PRIMER_LEFT_*_SEQUENCE'
, where the ‘*
’ is a wildcard?
I can successfully extract some of the primer designs using bracket notation like so:
for i,j in design_output.items():
print(i, j['PRIMER_LEFT_0_SEQUENCE'])
Primer3 will design multiple primers for you which I would like to capture. My goal is to create a loop which iterates through the dict and extracts out every instance of a novel primer design.
The output I am imagining might look something like this:
{'seq1': [‘atcgatcgta’,‘cgagcatct’,’cgcgcgaatgc’],
'seq2': [‘cgcgccagtcg’,’cgcgatatacgat’,taatcgatcg’]
}
@barslmn, thank you so much! This was exactly what I was looking for!
For completeness (and for anyone else who may stumble across this post with the same problem) this is the code I used to address the problem above:
With an output like so:
(Please ignore the primer sequences as they are a contrived example)
Looks great ^^. Here some slightly faster alternatives.
Thank you again for your help!