How To Get "Journals" Information From Genbank Entry Using Python
1
0
Entering edit mode
11.1 years ago
fm271 ▴ 20
LOCUS       AAW04511                  12 aa            linear   PAT 15-DEC-2004
DEFINITION  Sequence 89 from patent US 6790938.
ACCESSION   AAW04511
VERSION     AAW04511.1  GI:56612658
DBSOURCE    accession AAW04511.1
KEYWORDS    .
SOURCE      Unknown.
  ORGANISM  Unknown.
            Unclassified.
REFERENCE   1  (residues 1 to 12)
  AUTHORS   Berchtold,P. and Escher,R.F.A.
  TITLE     Anti-GPIIb/IIIa recombinant antibodies
  JOURNAL   Patent: US 6790938-A 89 14-SEP-2004;
            ASAT AG Applied Science & Technology; Zug;
            DEX;
  REMARK    CAMBIA Patent Lens: US 6790938
FEATURES             Location/Qualifiers
     source          1..12
                     /organism="unknown"
ORIGIN      
        1 gsgsylgyyf dy
//

In the above genbank entry, how can I get "journal" and "remark" information present in "REFERENCE". I can access authors and title but not journal and remark information.

from Bio import Entrez, SeqIO
handle = Entrez.efetch(db="protein", id="AAW04511",rettype="gb")
seq_record = SeqIO.read(handle, "genbank")
seqAnn = seq_record.annotations
seqAnn['references'][0].title
seq_record.annotations['references'][0].authors

Any help will be appreciated.

python biopython • 3.7k views
ADD COMMENT
0
Entering edit mode

The line beginning "seqAnn" seems to be irrelevant to your problem.

ADD REPLY
0
Entering edit mode

sorry, I forgot to include the line. Edited. but Peter has already answered this.

ADD REPLY
3
Entering edit mode
11.1 years ago
Peter 6.0k

Thank you for posting a self contained example :)

>>> from Bio import Entrez, SeqIO
>>> Entrez.email = "Your.Name@example.org"
>>> handle = Entrez.efetch(db="protein", id="AAW04511",rettype="gb")
>>> seq_record = SeqIO.read(handle, "genbank")
>>> seq_record.annotations['references'][0].authors
'Berchtold,P. and Escher,R.F.A.'
>>> seq_record.annotations['references'][0].title
'Anti-GPIIb/IIIa recombinant antibodies'
>>> seq_record.annotations['references'][0].journal
'Patent: US 6790938-A 89 14-SEP-2004; ASAT AG Applied Science & Technology; Zug; DEX;'
>>> seq_record.annotations['references'][0].comment
'CAMBIA Patent Lens: US 6790938'

I'm surprised you didn't guess it was just .journal given you'd found .title and .authors fine. Here's a useful tip for exploring a new data structure in Python is the dir(...) function will list all the attributes and methods (for now ignore all the ones starting with an underscore):

>>> dir(seq_record.annotations['references'][0])
[..., 'authors', 'comment', 'consrtm', 'journal', 'location', 'medline_id', 'pubmed_id', 'title']
ADD COMMENT
0
Entering edit mode

Thanks for reply. I remember I tried this but I might be doing something wrong.

ADD REPLY

Login before adding your answer.

Traffic: 1235 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6