After running Stringtie/Feature counts the output is expression value of all the genes present in the annotation file. To be specific 60675 in GRCh38p5. For finding deferentially expressed genes is it good to work with all the genes? Are there any chance of duplicate genes in the number? There are no Ensemble ID duplication.
P.S.: Kindly suggest me some good papers that have done DEG studies with the new tuxedo suite.
What is your question?
60K genes in Hg38, are you sure?
@OP: did you try only known genes while alinging/calling transcripts? Did you check for novel transcripts?
Haven't touched novel transcripts yet. Looking to find DEG among known. The problem is DESeq2 reports ~18k DEG and Ballgown ~900 at p<0.05.
when you say 'new tuxedo suite", I believe you did the mapping with HISAT2. If yes, please share the mapping summary
The overall alignment rate was >90% for all sample