I am trying to convert gbff formatted annotation file to gff3 to be able to use it with gbrowse. I did find a tool for doing this but it resulting in a gff file over 250GB and growing at which point I killed it. How can I go about this conversion?
I am trying to convert gbff formatted annotation file to gff3 to be able to use it with gbrowse. I did find a tool for doing this but it resulting in a gff file over 250GB and growing at which point I killed it. How can I go about this conversion?
You have not specified what is in your gbff nor with which tool you attempted the conversion (though for the later I suspect one of the bioperl ones), so my answer has to be generic and may or may not apply to your case.
Strictly speaking, gbff is "multiple gbk/gbf (GenBank) files concatenated". As GBrowse supports multiple databases, one solution might be to subdivide your gbff file into logical parts and create for each one an own, smaller, gbff or gbk file. Then process each one into a separate database. Beware: the GBrowse database files need space, too, and in my environment that's between 2 and 3 times the amount taken by the GFF3.
Did you compare the input and the output for the initial records? I'm curious what they looked like. Typically, I would expect a GFF file to be smaller than a GenBank file, since the GenBank file is more verbose. That said, the genbank2gff3 script is far from perfect, and it's output often needs tweeked to be valid GFF3. This is not the fault of the authors, as GenBank files vary so widely in their content that the converter can't anticipate what every author of GenBank files will do.
Finally, I would suggest that a better place to ask this question is the BioPerl mailing list.
with sample data that reproduces the strange behavior (whatever it is).
i tried to create a gff3 file from .gbk file using bp_genbank2gff3.pl but what i get is same features repeating many times.. and the file keeps growing in size until my harddisk gets full.. i have tried to filter all other features except "region" but still it repeats a single entry many times.. i have attached a part of the file generated.. pls kindly help me.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I am using bp_genbank2gff3.pl for the conversion. How does one go about subdividing the gbff file? What I was wondering was how does a ~1GB gbff file turn into a >250GB gff file. Is there a lot of redundant information in the second format or am I missing out some parameter in the conversion.