Hello Everyone I'm working on an RNA seq data set obtained from a bacteria, I have followed a pipeline described for HISAT2 using Stringtie and Ballgown. My problem lie in the fact that how to convert the transcript ids generated by Stringtie (STRG0001 or MSTRG000) to actual gene names. The subsequent Differential expression analysis also reports the results with MSTRG or STRG as gene names. I tried to pares the gff file and match these transcripts to gene IDs but i have observed that every gff file is different and same script doesn't works on other files. I would really be thankful if you guys can help me out in this regard, because with the proper gene names my analysis is incomplete. Is there a way to map gene name or symbols to the transcripts. I fell I'm missing some step or is there a method to be followed. I would greatly appreciate the help.
If you use a reference annotation file (
-G
parameter) at the time of transcript assembly using stringtie you should see the names from your reference GTF in the stringtie output. Only novel genes/transcript variants end up with the STRG identifiers.Thanks for the reply. I do use the same command with -G, but what i get is the reference ids like transcript_id "rna-XM_029034609.1"; gene_id "gene-CJI97_002588" and in case of novel tanscripts i get these stringtie ids.
I inspected the gff file, and it has gene locus , and product , there is no gene symbol or gene id. probably thats why I'm getting transcript as above. Is this a problem that everyone faces or its just me , because i have not yet come across any other question like this.