Hi,
I did the assembly and the annotation of the genome of a eukaryotic organisms. The prediction of the genes and the possible proteins was done with AUGUSTUS, and the annotation of these genes was made using Blastp with the RefSeq database. For the names of some proteins, field "product", the name of a species appears between the symbols "[" "]", since it is the name that appears in the RefSeq database. Example "[Leishmania infantum JPCM5]". My question is if this name should be removed from the GTF file, since my species is not that. On the other hand, if I remove it, the "protein_id" field would continue to refer to the protein with all that name, including the symbols "[" "]" and the name of a species among them. I add a part of my GTF file as an example.
jcf7180000024611 AUGUSTUS gene 2158 2691 1 - . gene_id "LPASSIMC3V1_1";
jcf7180000024611 AUGUSTUS mRNA 2158 2691 1 - . gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611 AUGUSTUS stop_codon 2158 2160 . - 0 gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611 AUGUSTUS CDS 2161 2691 1 - 0 gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611 AUGUSTUS start_codon 2689 2691 . - 0 gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611 AUGUSTUS gene 3930 4637 1 - . gene_id "LPASSIMC3V1_2"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611 AUGUSTUS mRNA 3930 4637 1 - . gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611 AUGUSTUS stop_codon 3930 3932 . - 0 gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611 AUGUSTUS CDS 3933 4637 1 - 0 gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611 AUGUSTUS start_codon 4635 4637 . - 0 gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611 AUGUSTUS gene 5850 6671 1 - . gene_id "LPASSIMC3V1_3"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611 AUGUSTUS mRNA 5850 6671 1 - . gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611 AUGUSTUS stop_codon 5850 5852 . - 0 gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";