Hi everyone
I have a fasta file where the header typically looks like this:
>FBpp0071678 type=protein;loc=2R:join(18050425..18051199,18052282..18052494,18056749..18058222,18058283..18059490,18059587..18059757,18059821..18059938,18060002..18060032); ID=FBpp0071678; name=a-PB; parent=FBgn0000008,FBtr0071764; dbxref=FlyBase_Annotation_IDs:CG6741-PB,FlyBase:FBpp0071678,GB_protein:AAF46809.2,REFSEQ:NP_524641,GB_protein:AAF46809; MD5=9eb6e9e4c12ec62fdeb31cca5b0683b6; length=1329; release=r5.57; species=Dmel;
________________________________________________________________________________________________________________________________________________________________________________________________________^----------------^
The file has thousands of sequences and I need to search according to the parent field of the header (I've highlighted it in the header I'm showing). Basically I'm searching a sequence based on the FBgn number.
I'm using bioperl's Bio::DB::Fasta
module but from what I understand when the db indexes my database it only recognizes the first non-space characters as the identifier and than everything after the white space is ignored.
Anyways is there any function in bioperl to search the database through the description field of the header?
Or would I have to make a perl script to change all the headers so that it fits my current specific criteria.
Thanks!
Thanks for the comment but I was looking for a way to index my DB with different identifiers in bioperl.