How I can find the ORF in a sequence using Python? also I need find all codons.
Thanks.
How I can find the ORF in a sequence using Python? also I need find all codons.
Thanks.
List of all codons:
genome = 'ACGTACGT....'
print map(lambda x: ''.join(x), zip(genome[0:], genome[1:], genome[2:]))
Set of all codons:
genome = 'ACGTACGT....'
print set(map(lambda x: ''.join(x), zip(genome[0:], genome[1:], genome[2:])))
If your genome is large, use itertools.izip instead of zip:
import itertools
itertools.izip(genome[0:], genome[1:], genome[2:])
To find ORF it's better to use Biopython (see zev.kronenberg's link).
Googled:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc224
Thanks,
I worked with the fasta file NC_005816.fna following the steps indicated in (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc224), but if I compare this ORF's with the obtained using toolbox WEB of NCBI http://www.ncbi.nlm.nih.gov/projects/gorf/orfig.cgi the results are differents, why?
I need to use python because the file of my sequence is 4GB. I can't use NBCI toolbox.
Thanks.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If you genomes are large, use itertools.izip instead of zip: import itertools; itertools.izip(genome[0:], genome[1:], genome[2:])