Hi I'm very new to python and I'm still struggling with this problem. For the problem, I already created a function called reverse_complement, which computes the reverse complement of the input DNA sequence given.
Problem 3 (Virtual PCR):
You will write a program that performs "Virtual PCR" using pairs of PCR primers and a template sequence, much in the manner of UCSC's In-Silico PCR tool (http://genome.ucsc.edu/cgi-bin/hgPcr?command=start).
Your program (call it your_last_name_virtual_pcr.py
) will load and assemble the pBR322 plasmid DNA sequence contained in hw_input\pBR322.txt - this will serve as your large target sequence. Your program will also load pairs of PCR primer sequences contained in hw_input\pBR322_PCR_products.txt
. Note that each line consists of a forward primer sequence and a reverse primer sequence separated by a comma. You will use the unaltered forward primer sequence and the reverse-complemented reverse primer sequence to find the start and end positions of the predicted PCR product and then extract its sequence. Remember to import your reverse_complement
function that you created in Problem 2 and use it to alter the reverse primer sequences.
Create an output file named hw_output\pBR322_PCR_products.txt
and on each line write the forward primer, the reverse-complemented reverse primer and the predicted PCR product. Separate the fields with commas.
pBR322 primers.txt- I read it in as one long string
"TCGGGCTCGCCACTTCGGGCTCA,GAGTTGCATGATAAAGAAGACAGTCA
ATGGCCCGCTTTATCAGAAGCCAGACA,GTCAGTGAGCGAGGAAGCGGAAGAGCGC AATCAGTGAGGCACCTATCTCAGCGATC,ACTCTAGCTTCCCGGCAACAATTAATAGA CGGTGTGAAATACCGCACAGATGC,GAGCGAGGAAGCGGAAGAGCGCCTGATG"
My pseudo code I've written is:
1. Read the input file/import the function
2. Create a for loop in order to obtain the complement sequence
3. Then I 'cleaned' the list by turning it into a list of lists so now it looks like this
[['TCGGGCTCGCCACTTCGGGCTCA', 'GAGTTGCATGATAAAGAAGACAGTCA'], ['ATGGCCCGCTTTATCAGAAGCCAGACA', 'GTCAGTGAGCGAGGAAGCGGAAGAGCGC'], ['AATCAGTGAGGCACCTATCTCAGCGATC', 'ACTCTAGCTTCCCGGCAACAATTAATAGA'], ['CGGTGTGAAATACCGCACAGATGC', 'GAGCGAGGAAGCGGAAGAGCGCCTGATG']]
4. Now I have to use my reverse_complement
function to change the reverse sequences (every other sequence starting from the second sequence) into reverse_complement
reverse sequences but I'm having trouble doing this
fwd_and_rsvcomp_rsv_primers = []
for pair in cleaned_primer_list:
if pair[0]:
t = pair[0]
fwd_and_rsvcomp_rsv_primers.append(t)
else:
s = pair[1]
fwd_and_rsvcomp_rsv_primers.append(s)
My output values on python shell gave me this (first string-foward sequence in each of the lists):
['TCGGGCTCGCCACTTCGGGCTCA', 'ATGGCCCGCTTTATCAGAAGCCAGACA', 'AATCAGTGAGGCACCTATCTCAGCGATC', 'CGGTGTGAAATACCGCACAGATGC']
Next I have to use the forward and reverse complemented reverse sequences and use to find the start and end positions of the predicted PCR product and then extract its sequence. But I'm stuck at the step where I have to use the reverse complement function though. I have no idea how to solve the next step. Can anyone help?
Thank you for your efforts!
Sorry but I'm having trouble understanding your comment. I talked to my professor and she said I did problem 2 right (making the function). But for the most part how do I get from here
to here
Starting with the second sequence, every other sequence is the reverse complemented reverse sequences of the corresponding previous forward primer sequences. From the latter list of sequences, will I be able to find the start and end positions of the predicted PCR product and then extract its sequence?
Key to understanding Coryza's answer is knowing what "index" does and also how a "for" loop lets you do the same thing to a bunch of things in a list.
Think of it like an assembly line.
The "for" loop lets you do the same thing over and over to similar objects.
If you're like me, it helps to imagine how you would do the task using a printout of the sequence and a pencil.
Sounds like you have a good grasp on the basic problem - what you want to achieve. Next step is to design a solution.
Hang in there. Programming is one of those things that helps you get better at solving problems. It can make you smarter overall.
Thank you for the encouragement!
Hi, sorry I was not clear enough. To get from your list to reverse complements:
Your
<reversefunction>
should contain areturn
statement which returns the sequence:If this is working, you can easily add one extra line to the previous loop:
Where
dnaString
represents your input file (mentioned above). You can then write output (see above). As an addition to Ann's reply: index is a function that, in this case, gives the positions of the first detected forward primer (item[0]
) and the first position of the reverse primer (item[1]
) in the reference. The sequence that you get returned contains the forward primer (since that is your starting point). You'll need some more manipulation to calculate and remove your forward primer (length), and/or in the case of multiple/absence products. Does that help? (E.g. fP is found on position 8, the rP is found on position 302, your sequence is positions 8 till 302). PS: You could also work without lists (see previous posts, where you actually loop over lines in a file, instead of looping over your created list)I was able to resolve the issue with your help! Thank you so much!
You're welcome ;)