Question

Parsing Genbank File: Get Locus Tag Vs Product

0

Entering edit mode

11.4 years ago

biotech ▴ 570

So that's all, just get this. Is there any already build perl/python module that could do this?

Thanks

bioperl biopython • 5.8k views

ADD COMMENT • link updated 11.4 years ago by Peter 6.0k • written 11.4 years ago by biotech ▴ 570

1

Entering edit mode

Could you improve your question by giving an example input file (e.g. URL to an NCBI GenBank file) and the desired output (e.g. first few lines), since this is not clear. That would explain what you mean by product - which might be protein description, amino acid sequence, etc.

ADD REPLY • link 11.4 years ago by Peter 6.0k

0

Entering edit mode

Hi Peter, thanks for you reply. Check my question in stack forums, I also posted there. I'm using Bio::GenBankParser module, as suggested by @TLP. It's giving me some issues but seems to fit my needs at the present time. http://stackoverflow.com/questions/22067785/parsing-genbank-file-get-locus-tag-vs-product

ADD REPLY • link 11.4 years ago by biotech ▴ 570

0

Entering edit mode

I don't think you got the most useful advice from SO. That module is an attempt to improve on something that works fine. Stick with the better supported, tried and tested original from BioPerl. Start with the Bio::SeqIO HOWTO and the Feature Annotation HOWTO.

ADD REPLY • link 11.4 years ago by Neilfws 49k

0

Entering edit mode

Hi Neil, I'll dig a little more into BioPerl features, still very new for me. Thanks for your reply.

ADD REPLY • link 11.4 years ago by biotech ▴ 570

0

Entering edit mode

See this http://biopython.org/DIST/docs/api/Bio.GenBank-module.html and http://www.biocodershub.net/community/parse-genbank-file/

ADD REPLY • link 11.4 years ago by ancient_learner ▴ 680

score 1 · Answer 1 · 2014-02-28

Having read your question on StackOverflow (please don't double post like this), here's a minimal Biopython answer:

import sys
from Bio import SeqIO
filename = sys.argv[1] # Takes first command line argument input filename
for record in SeqIO.parse(filename, "genbank"):
    for feature in record.features:
        if feature.type == "CDS":
            locus_tag = feature.qualifiers.get("locus_tag", ["???"])[0]
            product = feature.qualifiers.get("product", ["???"])[0]
            print("%s\t%s" % (locus_tag, product))

With minor changes you can write this out to a file instead.