After running the Braker2 pipeline how would one go about converting the braker.gtf
output plus fasta
into a genbank
format?
I found this post suggesting EMBOSS seqret. So I set about converting to from .gtf
to .gff
. To do this I used AGAT as suggested in this post. However when doing this I get a lot of gff3 reader errors
:
For example:
The feature type (3rd column) is constrained to be either a term from the Sequence Ontology or an SO accession number. The latter alternative is distinguished using the syntax SO:000000. In either case, it must be sequence_feature (SO:00
We filter the ontology to apply this rule. We found 1757 terms that are sequence_feature or is_a child of it.
-------------------------------- parse features --------------------------------
=> GFF version parser used: 2.5
gff3 reader error level1: No ID attribute found @ for the feature: IC0001_1 AUGUSTUS gene 3320954 3321577 1 - .
gff3 reader error level2: No ID attribute found @ for the feature: IC0001_1 AUGUSTUS transcript 3320954 3321577 1 - .
WARNING level2: No Parent attribute found @ for the feature: IC0001_1 AUGUSTUS transcript 3320954 3321577 1 - . ID "transcript-1"
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the
IC0001_1 AUGUSTUS transcript 3320954 3321577 1 - . ID "transcript-1"
gff3 reader error level1: No ID attribute found @ for the feature: IC0001_2468 AUGUSTUS gene 2 442 0.78 - .
gff3 reader error level2: No ID attribute found @ for the feature: IC0001_2468 AUGUSTUS transcript 2 442 0.78 - .
WARNING level2: No Parent attribute found @ for the feature: IC0001_2468 AUGUSTUS transcript 2 442 0.78 - . ID "transcript-2"
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the
IC0001_2468 AUGUSTUS transcript 2 442 0.78 - . ID "transcript-2"
gff3 reader error level1: No ID attribute found @ for the feature: IC0001_1 AUGUSTUS gene 11900730 11901159 1 - .
gff3 reader error level2: No ID attribute found @ for the feature: IC0001_1 AUGUSTUS transcript 11900730 11901159 1 - .
WARNING level2: No Parent attribute found @ for the feature: IC0001_1 AUGUSTUS transcript 11900730 11901159 1 - . ID "transcript-3"
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the
IC0001_1 AUGUSTUS transcript 11900730 11901159 1 - . ID "transcript-3"
gff3 reader error level1: No ID attribute found @ for the feature: IC0001_180 AUGUSTUS gene 3084 5230 0.23 + .
gff3 reader error level2: No ID attribute found @ for the feature: IC0001_180 AUGUSTUS transcript 3084 5230 0.23 + .
WARNING level2: No Parent attribute found @ for the feature: IC0001_180 AUGUSTUS transcript 3084 5230 0.23 + . ID "transcript-4"
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the
IC0001_180 AUGUSTUS transcript 3084 5230 0.23 + . ID "transcript-4"
gff3 reader error level1: No ID attribute found @ for the feature: IC0001_1386 AUGUSTUS gene 494 973 0.65 - .
gff3 reader error level2: No ID attribute found @ for the feature: IC0001_1386 AUGUSTUS transcript 494 973 0.65 - .
WARNING level2: No Parent attribute found @ for the feature: IC0001_1386 AUGUSTUS transcript 494 973 0.65 - . ID "transcript-5"
I don't understand what the tools is unable to identify features when they are there. Is it perhaps because gtf allows for features names that gff3 does not? My worry is this will effect the final genbank files?
EDIT: Here is a sample of the gtf https://pastebin.com/CEyfqR1H
Could you provide a sample of your GTF file? It sounds the last column (the 9th) is missing. So AGAT is creating relationships (parent/id) for the feature sequentialy (reading line by line a mRNA coming after a gene will be linked to it etc...)
I have added a sample of the gtf.
Ok I have updated my answer. No worries, your conversion went well ^^