Hi,
I want to repair GTF file by adding a unique string (such as Product name) to empty gene_id "". I would really appreciate it if anyone could provide any solution.
For example:
grep -m1 'gene_id ""' mygtf.gtf
NC_001717.1 RefSeq exon 1004 1071 . + . **gene_id ""**; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";
I want to add the product name between the double quotes right after the gene_id like:
NC_001717.1 RefSeq exon 1004 1071 . + . gene_id "tRNA-Phe"; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";
I have 24 empty gene_id, and need to fix all of them. I obtained this file from NCBI RefSeq. Unfortunately, this species is not available from the Ensemble database.
The original reason why I would like to fix the GTF file is to filter GTF file with cellragner mkgtf
. I am getting the below error, so I need to modify the GTF file.
cellranger.reference.GtfParseError: Error while parsing GTF file /~/genome/mygtf.gtf Property 'gene_id' is empty in GTF line 1809658: NC_001717.1 RefSeq exon 1004 1071 . + gene_id ""; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";
Thank you!
Thank you so much for your suggestion. And, yes, my idea to use the product name was not great. I was able to convert my gtf by using the functions you listed above from AGAT. Thanks again!