Hi everyone.
I would really appreciate a hand with the following:
I am trying to convert 1911 files in GBK format to GFF3 and FASTA.
For this purpose I've installed BioPerl (1.7.8) and the module Bio::DB::GFF(1.7.4) via cpan in Straberry Perl for Windows.
When executing perl bp_genbank2gff3.pl --dir path_to_files --noinfer --split --outdir path_to_output
[ 1 ]
the code runs alright but only 1270 new files are created: 635 GFF3 and 635 fastas.
1) Is there something that is terminating the process before it has finished running all the files? How can I correct it?
It might be relevant to note that the resulting files do not mantain the name of the genbank input (which is properly indexed (i.e. "gen011_ctg009_region001.gbk") in order to keep them trackable). Instead, they appear to be the LOCUS line of the genbank files, being many of them called "contigX" or "scaffo_000000X". The latter makes me think that the loss of proper indexation has rewritten 1276 files of the total input as when scrolling in the prompt I recognize no arbitrary skipping of the data.
2) Is there a way to fix a parameter in order to indicate the name of the input file as the name of the output files?
Particularly, I need this change to be reflected in the description line of the fastas and in the first column of the GFFs, for which it would make more sense to change the LOCUS line of the GBKs.
3) Any ideas of how I could achieve this other than manually if there was no solution for the question #2 ?
Thanks in advance.