Entering edit mode
6.2 years ago
anasjamshed
▴
140
I made a script with the help of rody blog but i also want to convert longest orf gene in CDS
I tried this code:
Genome_Seq = open("C:\\Users\\cnn\\Desktop\\MS(BI)\\Bioinformatics 1\\Droso.txt")
Genetic_Codes = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}
def translate_dna(Genome_Seq):
"""Return the translation of a DNA sequence"""
Genome_Seq=Genome_Seq.read()
last_codon_start = len(Genome_Seq) - 2
protein = ""
# for each codon in the dna, append an amino acid to the protein
for start in range(0,last_codon_start,3):
codon = Genome_Seq[start:start+3]
amino_acid = Genetic_Codes.get(Genome_Seq.upper(),'X')
protein = protein + amino_acid
return protein
#Defining and Declaring of Function
def orfFINDER(dna,frame):
start_codon = ['ATG']
stop_codons = ['TGA', 'TAG', 'TAA']
start_positions = []
stop_positions = []
num_starts=0
num_stops=0
for i in range(frame-1,len(dna),3):
codon=dna[i:i+3]
if codon in start_codon:
start_positions += str(i+1).splitlines()
if codon in stop_codons:
stop_positions += str(i+1).splitlines()
for line in stop_positions:
num_stops += 1
for line in start_positions:
num_starts += 1
orffound = {}
if num_stops >=1 and num_starts >=1: #first statment: the number of stop codons and start condos are greater than or equal to 1;
orfs = True
stop_before = 0
start_before = 0
if num_starts > num_stops:
num_runs = num_starts
if num_stops > num_starts:
num_runs = num_stops
if num_starts == num_stops:
num_runs = num_starts
position_stop_previous = 0
position_start_previous = 0
counter = 0
for position_stop in stop_positions:
position_stop = int(position_stop.rstrip()) + 2
for position_start in start_positions:
position_start = position_start.rstrip()
if int(position_start) < int(position_stop) and int(position_stop) > int(position_stop_previous) and int(position_start) > int(position_stop_previous):
counter += 1
nameorf = "ORF"+str(counter)
position_stop_previous += int(position_stop) - int(position_stop_previous)
position_start_previous += int(position_start) - int(position_start_previous)
sizeorf = int(position_stop) - int(position_start) + 1
orffound[nameorf] = position_start,position_stop,sizeorf,frame
else:
pass
else:
orfs = False
return orffound
#FUNCTION END
#READ FASTA FILE AND SAVE HEADERS AND SEQUENCES IN A DICTIONARY
Genome_Seqs={}
for Line in Genome_Seq:
Line = Line.rstrip()
if Line[0] == '>':
Words=Line.split()
Name=Words[0][1:]
Genome_Seqs[Name]=''
else:
Genome_Seqs[Name] = Genome_Seqs[Name] + Line
#DEFINE FRAME TO FIND ORF
#if frame = 1, start from the first position in the sequence
frame= int(input("Enter Frame Number: "))
#EXECUTE THE ORFFINDER FUNCTION
for i in Genome_Seqs.items():
header= i[0]
seq = i[1]
orf = orfFINDER(seq,frame)
print("\nThe Total ORF Found in frame" ,frame, "is : " ,len(orf) ,'\n\n' )
for i in orf.items():
numorf=i[0]
startorf=orf[numorf][0]
stoporf=orf[numorf][1]
lengthorf=orf[numorf][2]
frameorf=orf[numorf][3]
print (numorf,"starts from",startorf,"bp and stop at",stoporf,"bp , length:",lengthorf,", Frame:",frameorf,'\n')
if lengthorf >= 1503:
print(numorf , "in frame" ,frameorf, "is legimate ORF as it contains more than 500 Base Pairs b/w start and stop codon and also contains more than 500 amino acids so we predict that it would be a gene \n","\nlength of Predicted gene: ",lengthorf,"bp")
Genome_Seq.close()
I stored genetic codes in dictionary but i dont know how to call orf converted into protein through codon
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Why not just use the
translate
method of BioPython?Sir i want to translate only orf which i calculate through script
You can script BioPython. Please make this question more straightforward than your last.
What good reason is there not to use BioPython in this instance?
how to use biopython in it? in this part of question i already calculated orf as shown in the code but after that i want to calculate orf gene cds with help of genetic code which i also stored in dictionary
Have you searched for biopython and taken a look at the cookbook and tutorial?
You'll have to explain this a little better and take some more time to formulate your sentences. It is currently hard to read.
Sir in the given code i first calculate orf then i stored genetic codes in dictionary as shown in starting because i want to convert the orf 1 nucleotide sequence into protein by using genetic code I also tried biopython but translate function gave me a wrong output
We understand the code you already have - that is not the issue. How do you know its wrong? Have you tried using different translation tables?
sir i made translation table by myself in code as you can see in the beginning.
What makes you think the output is wrong? Biopython is a very mature package, I'd be really surprised if the translate function is wrong.
biopython translate function not works well for input fasta files
Well that’s demonstrably not true. Fasta files are perfectly valid inputs to BioPython. The translate function is a method of SeqRecords so the input file type has very little relevance.
You should show what biopython code you tried because I’m willing to bet you simply aren’t using it correctly.