Hi Francis,
If I understand, you want to take a .gb file with nucleotides, and write a new file with the protein sequences in fasta? In Biopython sequences/records don't have to have a particular format until they're written (they are a python object) - so do the conversion from DNA to protein first, then write them out in the format you want. You also don't have to go DNA -> mRNA -> protein, if there is a DNA alphabet seq.translate()
works fine
I don't have Biopython on the computer in front of me, so this is untested:
EDIT
The translate
method belongs to sequences, rather than the record that holds information about the sequences and their ID, description etc : SeqRecord
. So, you need to define a function that takes each record and makes a new SeqRecord
with the translated sequences:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
def translate_record(record):
"""Returns a new SeqRecord with translated sequence."""
return SeqRecord(seq = record.seq.translate(), \
id = record.id + "_translated", \
description = record.description)
infile= 'sequence.gb' #'handle' usually refers to an open() object
output = 'sequence.fasta'
records = Bio.SeqIO.parse(infile, 'gb')
#start an empty list, which we'll fill with protein seqs
proteins = []
for rec in records:
proteins.append(translate_record(rec))
SeqIO.write(proteins, open(output, 'w'), 'fasta')
If you prefer, you can use generator expressions in place of the for loop (that way you don't have to load the whole list into memory):
proteins = (translate_record(r) for r in records)
You should check out the Biopython wiki for some examples of Seq and SeqIO
it keeps returning me with the error saying" SeqRcord object has no atttribute 'translate'
Oh right, knew I should have checked it before I posted it.
translate
is a method onSeq()
so in this case it'srec.seq.translate
. I'll edit this with tested code later, but for now see this bit of the tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:SeqIO-reverse-complement.wonderful!! thanks so much!
Just to let you know, I'll be borrowing this code for your function if you don't mind.
Francis, of course you can use that snippet. I would say, it will be of more benefit to you in the long run if you play around with it in an interactive session - use
print
dir()
andhelp()
to work out what each piece of the code is doing