Parsing Genbank Format In Bioperl.
2
0
Entering edit mode
11.9 years ago
Daniel ★ 4.0k

I am attempting (with bioperl) to extract the JOURNAL field from a set of Genbank records, but I cant find a list of the references that are used ie

while (my $seq = $in->next_seq() ) {
        print $seq->accession . "\n";

prints accession number

while (my $seq = $in->next_seq() ) {
        print $seq->desc; . "\n";

prints the description

while (my $seq = $in->next_seq() ) {
        print $seq->seq. "\n";

Prints the gene sequence etc, etc, etc.

This has just been gleaned from the bioperl site and other questions as I cant find a reference for the whole scheme. Can anyone point me in the right direction? The http://www.bioperl.org/wiki/Module:Bio::SeqIO::genbank is a dead end unfortunately.

Thanks


For reference:

LOCUS       JQ354682                1420 bp    DNA     linear   PLN 01-JAN-2013
DEFINITION  Gomphonema clevei strain TCC507 ribulose-1,5-bisphosphate
            carboxylase/oxygenase large subunit (rbcL) gene, partial cds;
            chloroplast.
ACCESSION   JQ354682
VERSION     JQ354682.1  GI:410947001
KEYWORDS    .
SOURCE      chloroplast Gomphonema clevei
  ORGANISM  Gomphonema clevei
            Eukaryota; Stramenopiles; Bacillariophyta; Bacillariophyceae;
            Bacillariophycidae; Cymbellales; Gomphonemataceae; Gomphonema.
REFERENCE   1  (bases 1 to 1420)
  AUTHORS   Kermarrec,L., Bouchez,A., Rimet,F. and Humbert,J.-F.
  TITLE     Using a polyphasic approach to explore the diversity and
            geographical distribution of the Gomphonema parvulum (Kutzing)
            Kutzing complex (Bacillariophyta)
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 1420)
  AUTHORS   Kermarrec,L., Bouchez,A., Rimet,F. and Humbert,J.-F.
  TITLE     Direct Submission
  JOURNAL   Submitted (05-JAN-2012) Asconit Consultants, 3 bld de Clairfont
            Bat. G, Toulouges F-66350, France
FEATURES             Location/Qualifiers
     source          1..1420
                     /organism="Gomphonema clevei"
                     /organelle="plastid:chloroplast"
                     /mol_type="genomic DNA"
                     /strain="TCC507"
                     /isolation_source="river"
                     /db_xref="taxon:1223578"
                     /country="Mayotte"
                     /collection_date="20-Apr-2009"
     gene            <1..>1420
                     /gene="rbcL"
     CDS             <1..>1420
                     /gene="rbcL"
                     /codon_start=1
                     /transl_table=11
                     /product="ribulose-1,5-bisphosphate carboxylase/oxygenase
                     large subunit"
                     /protein_id="AFV95053.1"
                     /db_xref="GI:410947002"
                     /translation="DRYESGVIPYAKMGYWDASYAVKTTDVLALFRITPQPGVDPVEA
                     AAAVAGESSTATWTVVWTDLLTACDRYRAKAYRVDPVPNTTDQFFAFIAYECDLFEEG
bioperl genbank • 8.3k views
ADD COMMENT
0
Entering edit mode

Hi Daniel, did you find something here that works? I am looking to extract the references from genbank files as well. I have read all the links here and am not having success with creating a perl script that works. Thanks!

ADD REPLY
0
Entering edit mode

Please read the first answer by Ryan and comments underneath for the solution.

ADD REPLY
0
Entering edit mode

Hello Neilfws. I have looked at those references and it is not straightforward for me. I have the following code that gives me one reference title, but my genbank file has many sequences.

#!/user/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $io = Bio::SeqIO->new(-file => "sequence.gb", -format => "genbank" );
my $seq_obj = $io->next_seq();
my $anno_collection = $seq_obj->annotation;

for my $key ( $anno_collection->get_all_annotation_keys ) {
    my @annotations = $anno_collection->get_Annotations($key);
    for my $value ( @annotations ) {
        if ($value->tagname eq "reference") {
            print "title: ",$value->title(), "\n";
        }
    }
}
ADD REPLY
4
Entering edit mode
11.9 years ago
Ryan Dale 5.0k

Searching CPAN for "genbank" finds more detailed Bio::SeqIO::genbank docs and near the end of that is a link to a how-to on feature annotation (since presumably the JOURNAL field will be considered an annotation).

ADD COMMENT
1
Entering edit mode

And indeed, extracting the REFERENCE section (to which JOURNAL belongs) is right there in the feature annotation how-to. Just search the page for the phrase "Some Annotation objects, like Reference".

ADD REPLY
1
Entering edit mode

That's great, thanks. found what I needed. The link to the CPAN seqIO::genbank from the wiki doesnt work and I went searching in a different direction. If I had found that I think I would have been sorted from the offset.

ADD REPLY
2
Entering edit mode
11.9 years ago

This is a more complex topic that you will need to spend some time with. Find a good guide on BioPerl in general via Google.

For example I found this good chapter: Beginning Perl for Bioinformatics: Genbank

ADD COMMENT
1
Entering edit mode

The book just instructed me to parse the file in standard perl which would have been my default anyway. I was trying to use this opportunity to learn more about bioperl and it was just a single parameter that was catching me out. Found it now though.

ADD REPLY
1
Entering edit mode

my mistake - on cursory examination I thought it was BioPerl but instead it seems to make use of their own custom module BeginPerlBioinfo

ADD REPLY

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6