how do I generate frame 5 and 6 from the complementary strand?
2
0
Entering edit mode
21 months ago

I need to generate 3 frames(1,2,3) for the cDNA (top) strand and 3 frames(4,5,6) for the complementary (bottom) strand. so the code should go through every 3 letters in the strands and find its corresponding amino acid from a file. for the bottom strand, this translation should occur backward (from right to left) to generate the frames. based on my understanding from the picture, to generate frame 5, the last 2 letters(on the right) in the complementary strand should be skipped and the translation should start from the last third letter backward, and for frame 6, only the last letter in the complementary strand should be skipped. I am struggling with how to code for frames 5 and 6. Could you please help me with how to write the code? note: the picture only shows two chunks of the strands. the actual length is 1170 letters.

I have tried many ways but none of them generated the correct frame. for example, I used this code to generate frame 5:

def translate_complementary_frame5(complementary_strand, codon_table):
    protein_sequence = ''
    for i in range(2, len(complementary_strand)):
        if i % 3 == 2:
            codon = complementary_strand[i-2:i+1][::-1]
            amino_acid = codon_table[codon]
            protein_sequence += amino_acid
    return protein_sequence

protein_sequence_frame5 = translate_complementary_frame5(complementary_strand, codon

enter image description here

python • 2.4k views
ADD COMMENT
1
Entering edit mode
21 months ago
Mensur Dlakic ★ 28k

No need to reinvent the wheel. There is already a BioPython function to translate sequences in all 6 reading frames.

from Bio import SeqIO
from Bio.SeqUtils import six_frame_translations

for record in SeqIO.parse("random_dna.fasta", "fasta"):
    print(record.id)
    print(six_frame_translations(record.seq))

It prints the following:

YL17G14_0_k141_12468844
GC_Frame: a:1703 t:1765 g:1172 c:1130
Sequence: ataatacaaa ... agaccttgcc, 5770 nt, 39.90 %GC

1/1
  N  T  N  H  *  K  A  E  K  F  *  *  H  G  G  H  T  L  G  *
 *  Y  K  S  L  K  S  R  E  V  L  M  T  R  R  A  Y  A  G  I
I  I  Q  I  T  K  K  Q  R  S  F  D  D  T  A  G  I  R  W  D
ataatacaaatcactaaaaagcagagaagttttgatgacacggcgggcatacgctgggat   41 %
tattatgtttagtgatttttcgtctcttcaaaactactgtgccgcccgtatgcgacccta
Y  Y  L  D  S  F  L  L  S  T  K  I  V  R  R  A  Y  A  P  I
 L  V  F  *  *  F  A  S  F  N  Q  H  C  P  P  C  V  S  P  Y
  I  C  I  V  L  F  C  L  L  K  S  S  V  A  P  M  R  Q  S  L

61/21
  D  R  Q  K  I  T  F  Y  F  A  L  L  D  A  R  A  L  *  H  L
 R  P  P  K  N  Y  I  L  F  R  A  F  G  R  A  S  F  I  T  L
K  T  A  K  K  L  H  F  I  S  R  F  W  T  R  E  L  Y  N  T
aagaccgccaaaaaattacattttatttcgcgcttttggacgcgcgagctttataacact   40 %
ttctggcggttttttaatgtaaaataaagcgcgaaaacctgcgcgctcgaaatattgtga
L  G  G  F  F  *  M  K  N  R  A  K  P  R  A  L  K  I  V  S
 S  R  W  F  I  V  N  *  K  A  S  K  S  A  R  A  K  Y  C  K
  V  A  L  F  N  C  K  I  E  R  K  Q  V  R  S  S  *  L  V  *

121/41
  W  Q  H  R  P  C  P  I  F  M  S  N  Q  K  R  K  R  K  M  D
 M  A  T  *  T  L  P  D  F  Y  V  K  S  K  E  K  K  E  N  G
Y  G  N  I  D  L  A  R  F  L  C  Q  I  K  R  E  K  G  K  W
tatggcaacatagaccttgcccgatttttatgtcaaatcaaaagagaaaaaggaaaatgg   35 %
ataccgttgtatctggaacgggctaaaaatacagtttagttttctctttttccttttacc
I  A  V  Y  V  K  G  S  K  *  T  L  D  F  S  F  F  S  F  P
 H  C  C  L  G  Q  G  I  K  I  D  F  *  F  L  F  L  F  I  S
  P  L  M  S  R  A  R  N  K  H  *  I  L  L  S  F  P  F  H  I

Like Ram suggested, your question is not about genetic programming.

ADD COMMENT
0
Entering edit mode

Thank you for your reply. Unfortunately, I have to write the codes myself for my project and cannot use any function from BioPython.

ADD REPLY
0
Entering edit mode

Is that a requirement for an academic project or examination?

ADD REPLY
0
Entering edit mode

yes, it is. I have to write a program that generates 6 reading frames for a given cDNA.

ADD REPLY
0
Entering edit mode

And you're absolutely sure you cannot use BioPython - as in, "do not use existing tools" is an explicitly stated request? If not, I'd check with your professor - reinventing the wheel is not a great way to test people, although this is a borderline worth-it sort of exercise.

ADD REPLY
0
Entering edit mode

this is what it says in the instructions: "You must not use pre-existing modules from BioPython or other repositories."

ADD REPLY
1
Entering edit mode
21 months ago
Mensur Dlakic ★ 28k

This would be a "pedestrian" way of writing the same function as above:

from Bio import SeqIO
from Bio.Seq import translate, reverse_complement

for record in SeqIO.parse("random_dna.fasta", "fasta"):
    record_rev = reverse_complement(record.seq)
    for frame in range(3):
        print(' Frame %d:' % (frame+1), str(record.seq[frame:].translate()))
    for frame in range(3):
        print(' Reverse frame %d:' % (frame+1), str(record_rev[frame:].translate()))

Looks like you already have a translation function. Now you need a forward sequence, a reverse complement sequence, and feed the following to your translation function:

translate(forward)
translate(forward[1:])
translate(forward[2:])
translate(reverse)
translate(reverse[1:])
translate(reverse[2:])

That's it.

ADD COMMENT
0
Entering edit mode

OP says they cannot use any function from BioPython, so they'll need to write their own translate and reverse_complement as well, as well as their own FASTA parser. A really stupid requirement IMO

ADD REPLY
0
Entering edit mode

thank you for your reply. could you please explain the logic you used for generating frame 6? (I should note that I am a beginner and do not have much experience in coding) unfortunately, as I will have to submit my project online, I cannot copy-paste the codes here that I have written so far but I will explain what I have done: we are given a file that contains a cDNA and another file that contains the codon table. I broke the sequence into chunks and then generated the complementary strand. then, generated frame 1(translating the whole top strand), frame 2 (translating from the second nucleotide in the top strand), and frame 3 (translating from the third nucleotide in the top strand). I have also managed to generate frame 4 (translating the whole complementary strand from right to left) but struggling to do the same for frames 5 and 6.

ADD REPLY
0
Entering edit mode

also, the functions for loading fasta and code as well as translate are given.

ADD REPLY
0
Entering edit mode

then, generated frame 1(translating the whole top strand), frame 2 (translating from the second nucleotide in the top strand), and frame 3 (translating from the third nucleotide in the top strand).

You do exactly the same for the reverse complement strand. Frame 5 is the reverse complement strand starting from the second nucleotide, and frame 6 is the reverse complement strand starting from the third nucleotide. That's the meaning of reverse[1:] and reverse[2:].

ADD REPLY
0
Entering edit mode

I wrote this for frame 5 :

protein_sequence_frame5 = translate_complementary(complementary_strand[2:], codon_table)

but the output is not correct:

     GLUHISVALGLYLEUVALLEUCYS***VALLEU***SERARG***GLUSERGLUGLYGLU   F1
     SERMETLEUALATRPSERPHEALAARGTYRCYSARGALAGLYGLUARGVALARGGLYLYS   F2
     ALACYSTRPPROGLYPROLEULEUGLYTHRVALGLUGLNVALARGGLU***GLYGLYARG   F3
   3 gagcatgttggcctggtcctttgctaggtactgtagagcaggtgagagagtgagggggaa   60
     ----:----|----:----|----:----|----:----|----:----|----:----|
   3 ctcgtacaaccggaccaggaaacgatccatgacatctcgtccactctctcactccccctt   60
     ARGTHRTHRGLYPROGLYASNASPPRO***HISLEUVALHISSERLEUTHRPROPROSER   F6
     CYSTHRPROARGTHRARGGLN***THRSERTYRLEULEUHISSERLEUSERPROSERPRO   F5
     LEUMETASNALAGLNASPLYSALALEUTYRGLNLEUALAPROSERLEUTHRLEUPROPHE   F4
ADD REPLY
0
Entering edit mode

sorry, it does not show the exact output. the lines are not aligned.

ADD REPLY
0
Entering edit mode

I can do that for you. Please just let me know if the * are part of the output.

ADD REPLY
0
Entering edit mode

so each "***" indicate a stop codon.

ADD REPLY
0
Entering edit mode

I formatted your post - are those *s part of the output or were you trying to highlight something?

ADD REPLY
0
Entering edit mode

* show the stop codons.

ADD REPLY
0
Entering edit mode

now it is aligned thank you.

ADD REPLY
0
Entering edit mode

It takes a bit of work - you can test this stuff on NotePad++ (windows) or TextWrangler/BBEdit (macOS) - if it aligns on those plain text editors, it will align here. Plus, you need to use code formatting. Without that, * is interperted as italics, ** as bold and *** as bold+italics.

ADD REPLY
0
Entering edit mode

thank you for letting me know. but on the app that im using to code, frame 2 is shifted to the right by one nucleotide and frame 3 is shifted by 2 nucleotides. same for frames 5 and 6.

ADD REPLY
0
Entering edit mode

You can edit and reproduce what you see exactly - I went with my intuition, which may or may not match what you want to show us.

ADD REPLY
0
Entering edit mode

protein_sequence_frame5 = translate_complementary(complementary_strand[2:], codon_table)

This is 6th frame, not 5th. complementary_strand is the same as complementary_strand[0:], which is 4th reading frame (or the first reverse complement frame). complementary_strand[1:] is the 5th frame, and complementary_strand[2:] the 6th frame.

Python counting is zero-based, and you wouldn't be the first to get confused by it.

ADD REPLY
0
Entering edit mode

it actually worked. thank you so much! but there is a small problem: at the beginning of the complementary strand (right-hand side) for each chunk, there is an extra amino acid that i need to remove just to get the same result as the picture i attached to my question. here a part of my output:

 GLUHISVALGLYLEUVALLEUCYS***VALLEU***SERARG***GLUSERGLUGLYGLU     F1
      SERMETLEUALATRPSERPHEALAARGTYRCYSARGALAGLYGLUARGVALARGGLYLYS    F2
       ALACYSTRPPROGLYPROLEULEUGLYTHRVALGLUGLNVALARGGLU***GLYGLYARG   F3
   3 gagcatgttggcctggtcctttgctaggtactgtagagcaggtgagagagtgagggggaa   60
     ----:----|----:----|----:----|----:----|----:----|----:----|
   3 ctcgtacaaccggaccaggaaacgatccatgacatctcgtccactctctcactccccctt   60
       CYSTHRPROARGTHRARGGLN***THRSERTYRLEULEUHISSERLEUSERPROSERPRO   F6
      ALAHISGLNGLYPROGLYLYSSERPROVALTHRSERCYSTHRLEUSERHISPROPROLEU    F5
     LEUMETASNALAGLNASPLYSALALEUTYRGLNLEUALAPROSERLEUTHRLEUPROPHE     F4

Could you help me to remove that extra amino acid from the beginning of each chunk?

ADD REPLY
0
Entering edit mode

im actually talking about frames 5 and 6

ADD REPLY
0
Entering edit mode

You're still not using code formatting. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

thank you so much.

ADD REPLY

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6