Entering edit mode
10.8 years ago
biotech
▴
570
So that's all, just get this. Is there any already build perl/python module that could do this?
Thanks
So that's all, just get this. Is there any already build perl/python module that could do this?
Thanks
Having read your question on StackOverflow (please don't double post like this), here's a minimal Biopython answer:
import sys
from Bio import SeqIO
filename = sys.argv[1] # Takes first command line argument input filename
for record in SeqIO.parse(filename, "genbank"):
for feature in record.features:
if feature.type == "CDS":
locus_tag = feature.qualifiers.get("locus_tag", ["???"])[0]
product = feature.qualifiers.get("product", ["???"])[0]
print("%s\t%s" % (locus_tag, product))
With minor changes you can write this out to a file instead.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Could you improve your question by giving an example input file (e.g. URL to an NCBI GenBank file) and the desired output (e.g. first few lines), since this is not clear. That would explain what you mean by product - which might be protein description, amino acid sequence, etc.
Hi Peter, thanks for you reply. Check my question in stack forums, I also posted there. I'm using Bio::GenBankParser module, as suggested by @TLP. It's giving me some issues but seems to fit my needs at the present time. http://stackoverflow.com/questions/22067785/parsing-genbank-file-get-locus-tag-vs-product
I don't think you got the most useful advice from SO. That module is an attempt to improve on something that works fine. Stick with the better supported, tried and tested original from BioPerl. Start with the Bio::SeqIO HOWTO and the Feature Annotation HOWTO.
Hi Neil, I'll dig a little more into BioPerl features, still very new for me. Thanks for your reply.
See this http://biopython.org/DIST/docs/api/Bio.GenBank-module.html and http://www.biocodershub.net/community/parse-genbank-file/