I need to be able to easily change between GFF/GTF + reference to either EMBL and GenBank formats, and vice versa. Are there are any frequently used tools for accomplishing this, or should I script something myself?
I need to be able to easily change between GFF/GTF + reference to either EMBL and GenBank formats, and vice versa. Are there are any frequently used tools for accomplishing this, or should I script something myself?
The EMBOSS tool seqret would be a possible option. For example:
Generating an EMBL-Bank style entry from a fasta sequence and a GFF feature table:
seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat embl -auto
Alternatively to get a GenBank style entry:
seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat genbank -auto
To go the other way and get the sequence in fasta format and the features as GFF use something like:
seqret -sformat embl -sequence aj242600.dat -feature -osformat fasta -offormat gff -auto
Please note that since these are starting from sequence plus features they do not create a full EMBL-Bank or GenBank style entry, since this requires additional information, such as references, not available in the source data.
Hi I would like to extract data in genbank format based on genome fasta file and gff file with coordinates. Could anybody help me?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It would be best to ask this as a separate question.
Bedtools can extract the fasta subsequences
get bedtools from here