Hi all,
I downloaded .embl files from The SEED and am trying to extract features from them using biopython.
For example, from the following excerpt of an embl file, I'm trying to get the line that contains the /product
string:
ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 9430 BP.
XX
AC unknown;
XX
DE Contig AMTS01000351 from Escherichia coli FDA506
XX
FH Key Location/Qualifiers
FH
FT source 1..9430
FT /mol_type="genomic DNA"
FT /db_xref="taxon: 1005474"
FT /genome_md5="b6a2d1d1a41be1cf3128536aecba12be"
FT /project="mshukla_1005474"
FT /genome_id="1005474.3"
FT /organism="Escherichia coli FDA506"
FT CDS 154..432
FT /db_xref="SEED:fig|1005474.3.peg.3831"
FT /translation="MKTKIVKGKTTKQDVLASFGEPDSRSLIDGEEQWSYTMYNSQSKA
FT TSFIPVVGLLAGGADSQTKSLTVSFKGEKVSTYIFNAGTSNVKTGIF"
FT /product="hypothetical lipoprotein"
...
I've been using SeqIO.parse
to get sequence records and looking at record.features
, but that's not giving me the /product
string:
for record in SeqIO.parse(open(sys.argv[1]),"embl"):
print record.id, record.features
The output is something like this:
unknown.1 [SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(9430), strand=1), type='source'), SeqFeature(FeatureLocation(ExactPosition(153), ExactPosition(432), strand=1), type='CDS'), SeqFeature(FeatureLocation(ExactPosition(507), ExactPosition(1710), strand=-1), type='CDS'),
...
I think there is a way to do it in Bioperl, but what's the equivalent for Biopython?
Thanks for any advice you might have!