Hi everyone,
I am doing the differential expression of two groups. I have used ncbi refseq gtf https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/263/795/GCF_002263795.1_ARS-UCD1.2/GCF_002263795.1_ARS-UCD1.2_genomic.gtf.gz and ran hisat2 and string tie without e option. When I looked at the stringtie output gtf individual and after merging them, it contains some transcript id as " unknown_transcript_1" for a some transcripts. I looked at initial ncbi gtf and some data have gene_id blank and have transcript_id as "unknown_transcript_1" for some transcripts. They are mostly from mitochondrial and scaffold part of genome. so when I ran the htseqcount I got first row as empty gene_id with read numbers. Should I exclude that row from htseqcount and do differential expression ? I have done same job with ensemble gtf and there was no issue like that.
I will really appreciate your help if you guys can suggest and give your recommendation for it .
Thanks in advance.
-Bhaumik
@patelbhaumikn please contact RefSeq to report this issue with GTF files.