NCBI gtf contains same trasncript id for some trasncripts and giving trouble on htseqcount
0
0
Entering edit mode
4.0 years ago

Hi everyone,

I am doing the differential expression of two groups. I have used ncbi refseq gtf https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/263/795/GCF_002263795.1_ARS-UCD1.2/GCF_002263795.1_ARS-UCD1.2_genomic.gtf.gz and ran hisat2 and string tie without e option. When I looked at the stringtie output gtf individual and after merging them, it contains some transcript id as " unknown_transcript_1" for a some transcripts. I looked at initial ncbi gtf and some data have gene_id blank and have transcript_id as "unknown_transcript_1" for some transcripts. They are mostly from mitochondrial and scaffold part of genome. so when I ran the htseqcount I got first row as empty gene_id with read numbers. Should I exclude that row from htseqcount and do differential expression ? I have done same job with ensemble gtf and there was no issue like that.

I will really appreciate your help if you guys can suggest and give your recommendation for it .

Thanks in advance.

-Bhaumik

RNA-Seq rna-seq Assembly alignment sequencing • 911 views
ADD COMMENT
0
Entering edit mode

@patelbhaumikn please contact RefSeq to report this issue with GTF files.

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6