I am attempting (with bioperl) to extract the JOURNAL field from a set of Genbank records, but I cant find a list of the references that are used ie
while (my $seq = $in->next_seq() ) {
print $seq->accession . "\n";
prints accession number
while (my $seq = $in->next_seq() ) {
print $seq->desc; . "\n";
prints the description
while (my $seq = $in->next_seq() ) {
print $seq->seq. "\n";
Prints the gene sequence etc, etc, etc.
This has just been gleaned from the bioperl site and other questions as I cant find a reference for the whole scheme. Can anyone point me in the right direction? The http://www.bioperl.org/wiki/Module:Bio::SeqIO::genbank is a dead end unfortunately.
Thanks
For reference:
LOCUS JQ354682 1420 bp DNA linear PLN 01-JAN-2013
DEFINITION Gomphonema clevei strain TCC507 ribulose-1,5-bisphosphate
carboxylase/oxygenase large subunit (rbcL) gene, partial cds;
chloroplast.
ACCESSION JQ354682
VERSION JQ354682.1 GI:410947001
KEYWORDS .
SOURCE chloroplast Gomphonema clevei
ORGANISM Gomphonema clevei
Eukaryota; Stramenopiles; Bacillariophyta; Bacillariophyceae;
Bacillariophycidae; Cymbellales; Gomphonemataceae; Gomphonema.
REFERENCE 1 (bases 1 to 1420)
AUTHORS Kermarrec,L., Bouchez,A., Rimet,F. and Humbert,J.-F.
TITLE Using a polyphasic approach to explore the diversity and
geographical distribution of the Gomphonema parvulum (Kutzing)
Kutzing complex (Bacillariophyta)
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 1420)
AUTHORS Kermarrec,L., Bouchez,A., Rimet,F. and Humbert,J.-F.
TITLE Direct Submission
JOURNAL Submitted (05-JAN-2012) Asconit Consultants, 3 bld de Clairfont
Bat. G, Toulouges F-66350, France
FEATURES Location/Qualifiers
source 1..1420
/organism="Gomphonema clevei"
/organelle="plastid:chloroplast"
/mol_type="genomic DNA"
/strain="TCC507"
/isolation_source="river"
/db_xref="taxon:1223578"
/country="Mayotte"
/collection_date="20-Apr-2009"
gene <1..>1420
/gene="rbcL"
CDS <1..>1420
/gene="rbcL"
/codon_start=1
/transl_table=11
/product="ribulose-1,5-bisphosphate carboxylase/oxygenase
large subunit"
/protein_id="AFV95053.1"
/db_xref="GI:410947002"
/translation="DRYESGVIPYAKMGYWDASYAVKTTDVLALFRITPQPGVDPVEA
AAAVAGESSTATWTVVWTDLLTACDRYRAKAYRVDPVPNTTDQFFAFIAYECDLFEEG
Hi Daniel, did you find something here that works? I am looking to extract the references from genbank files as well. I have read all the links here and am not having success with creating a perl script that works. Thanks!
Please read the first answer by Ryan and comments underneath for the solution.
Hello Neilfws. I have looked at those references and it is not straightforward for me. I have the following code that gives me one reference title, but my genbank file has many sequences.