Question

Python Script to to convert orf gene into protein

0

Entering edit mode

6.1 years ago

anasjamshed ▴ 140

I made a script with the help of rody blog but i also want to convert longest orf gene in CDS

I tried this code:

Genome_Seq = open("C:\\Users\\cnn\\Desktop\\MS(BI)\\Bioinformatics 1\\Droso.txt") 

Genetic_Codes = { 
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T', 
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R', 
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P', 
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R', 
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A', 
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G', 
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L', 
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'} 

def translate_dna(Genome_Seq): 
    """Return the translation of a DNA sequence"""
    Genome_Seq=Genome_Seq.read()
    last_codon_start = len(Genome_Seq) - 2 
    protein = "" 

    # for each codon in the dna, append an amino acid to the protein
    for start in range(0,last_codon_start,3): 
        codon = Genome_Seq[start:start+3] 
        amino_acid = Genetic_Codes.get(Genome_Seq.upper(),'X') 
        protein = protein + amino_acid

    return protein 

#Defining and Declaring of Function
def orfFINDER(dna,frame):

    start_codon = ['ATG']
    stop_codons = ['TGA', 'TAG', 'TAA']
    start_positions = []
    stop_positions = []
    num_starts=0
    num_stops=0


    for i in range(frame-1,len(dna),3):
        codon=dna[i:i+3]
        if codon in start_codon:
            start_positions += str(i+1).splitlines()
        if codon in stop_codons:
            stop_positions += str(i+1).splitlines()

    for line in stop_positions:
        num_stops += 1

    for line in start_positions:
        num_starts += 1

    orffound = {}

    if num_stops >=1 and num_starts >=1: #first statment: the number of stop codons and start condos are greater than or equal to 1;

        orfs = True
        stop_before = 0
        start_before = 0

        if num_starts > num_stops:
            num_runs = num_starts
        if num_stops > num_starts:
            num_runs = num_stops
        if num_starts == num_stops:
            num_runs = num_starts

        position_stop_previous = 0
        position_start_previous = 0
        counter = 0

        for position_stop in stop_positions:

            position_stop = int(position_stop.rstrip()) + 2

            for position_start in start_positions:

                position_start = position_start.rstrip()

                if int(position_start) < int(position_stop) and int(position_stop) > int(position_stop_previous) and int(position_start) > int(position_stop_previous):

                    counter += 1
                    nameorf = "ORF"+str(counter)
                    position_stop_previous += int(position_stop) - int(position_stop_previous)
                    position_start_previous += int(position_start) - int(position_start_previous)
                    sizeorf = int(position_stop) - int(position_start) + 1

                    orffound[nameorf] = position_start,position_stop,sizeorf,frame

                else:

                    pass

    else:

        orfs = False

    return orffound
#FUNCTION END


 #READ FASTA FILE AND SAVE HEADERS AND SEQUENCES IN A DICTIONARY
Genome_Seqs={}

for Line in Genome_Seq:
    Line = Line.rstrip()
    if Line[0] == '>':
        Words=Line.split() 
        Name=Words[0][1:]
        Genome_Seqs[Name]=''
    else:
        Genome_Seqs[Name] = Genome_Seqs[Name] + Line





#DEFINE FRAME TO FIND ORF
#if frame = 1, start from the first position in the sequence
frame= int(input("Enter Frame Number: "))


#EXECUTE THE ORFFINDER FUNCTION
for i in Genome_Seqs.items():
    header= i[0]
    seq = i[1]
    orf = orfFINDER(seq,frame)
    print("\nThe Total ORF Found in frame" ,frame, "is : " ,len(orf) ,'\n\n' ) 
    for i in orf.items():
        numorf=i[0]
        startorf=orf[numorf][0]
        stoporf=orf[numorf][1]
        lengthorf=orf[numorf][2]
        frameorf=orf[numorf][3] 
        print (numorf,"starts from",startorf,"bp and stop at",stoporf,"bp , length:",lengthorf,", Frame:",frameorf,'\n')
        if lengthorf >= 1503:
            print(numorf , "in frame" ,frameorf, "is legimate ORF as it contains more than 500 Base Pairs b/w start and stop codon and also contains more than 500 amino acids so we predict that it would be a gene \n","\nlength of Predicted gene: ",lengthorf,"bp")

Genome_Seq.close()

I stored genetic codes in dictionary but i dont know how to call orf converted into protein through codon

gene protein python genetic code • 4.4k views

ADD COMMENT • link updated 6.1 years ago by Ram 44k • written 6.1 years ago by anasjamshed ▴ 140

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

Why not just use the translate method of BioPython?

ADD REPLY • link 6.1 years ago by Joe 21k

0

Entering edit mode

Sir i want to translate only orf which i calculate through script

ADD REPLY • link 6.1 years ago by anasjamshed ▴ 140

0

Entering edit mode

You can script BioPython. Please make this question more straightforward than your last.

What good reason is there not to use BioPython in this instance?

ADD REPLY • link 6.1 years ago by Joe 21k

0

Entering edit mode

how to use biopython in it? in this part of question i already calculated orf as shown in the code but after that i want to calculate orf gene cds with help of genetic code which i also stored in dictionary

ADD REPLY • link 6.1 years ago by anasjamshed ▴ 140

0

Entering edit mode

Have you searched for biopython and taken a look at the cookbook and tutorial?

calculate orf gene cds with help of genetic code which i also stored in dictionary

You'll have to explain this a little better and take some more time to formulate your sentences. It is currently hard to read.

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

0

Entering edit mode

Sir in the given code i first calculate orf then i stored genetic codes in dictionary as shown in starting because i want to convert the orf 1 nucleotide sequence into protein by using genetic code I also tried biopython but translate function gave me a wrong output

ADD REPLY • link 6.1 years ago by anasjamshed ▴ 140

0

Entering edit mode

We understand the code you already have - that is not the issue. How do you know its wrong? Have you tried using different translation tables?

ADD REPLY • link 6.1 years ago by Joe 21k

0

Entering edit mode

sir i made translation table by myself in code as you can see in the beginning.

ADD REPLY • link 6.1 years ago by anasjamshed ▴ 140

0

Entering edit mode

I also tried biopython but translate function gave me a wrong output

What makes you think the output is wrong? Biopython is a very mature package, I'd be really surprised if the translate function is wrong.

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

0

Entering edit mode

biopython translate function not works well for input fasta files

ADD REPLY • link 6.1 years ago by anasjamshed ▴ 140

1

Entering edit mode

Well that’s demonstrably not true. Fasta files are perfectly valid inputs to BioPython. The translate function is a method of SeqRecords so the input file type has very little relevance.

You should show what biopython code you tried because I’m willing to bet you simply aren’t using it correctly.

ADD REPLY • link 6.1 years ago by Joe 21k