Hi~ I used the hisat2-stringtie pipeline to deal with RNA-seq data and got a result with MSTRG tags. Some of them had gene name which was convenient to do function annotation after. But 1/3 of my data had rows with MSTRAG tag merely like this:
chr6 StringTie transcript 72101340 72101890 1000 - . gene_id "MSTRG.58117"; transcript_id "MSTRG.58117.1";
chr6 StringTie exon 72101340 72101890 1000 - . gene_id "MSTRG.58117"; transcript_id "MSTRG.58117.1"; exon_number "1";
Is there any suggestions on how to deal with them?
Thanks! Aoi
Really appreciate for your reply. I used the GTF file from Ensembl. Transcript listed above had location information but no reference annotation gene IDs. So is it proper to drop them away and keep those with gene symbols for further analysis? Thanks!
It depends on end goal of the study. If you are interested only standard transcripts/genes (i.e Ensembl, all or targeted), it is okay to exclude MSTRG transcripts/genes for downstream analysis. But do not throw away those genes/transcripts. Try to analyze these coordinates with care. They might be partial /& novel transcripts/genes or may be available in other databases.
Hello everyone, I also have above same problem i.e; ( in my case Cuffdiff gives gene ID but there specific gene names are missing) I used reference.gtf file during every steps. I also try to get specific gene name using there chr. locus number but no result found, did blast also. No any information get from databases, please guide what steps I do to find gene names. I need gene name for further downstream analysis.
Hi, divya~ I think you can check whether the reference.gtf matches your data. If there were no specific gene names for any sequence, one possible reason is that the reference.gtf and your bowtie index genome were different (hg38 and hg19 for example).