Hi EveryOne, I am new to RNA-seq data analysis, now am trying to compare different quantifiers, Stringtie, featurecounts and HTSeq . I have some questions, i am really happy if someone helps me.
Questions:
I have removed the genes which are have <99 read counts. Is that ok or should i go for 9?
When i have removed <99 read counts genes i got 11802 genes from featurecounts , 11305 from HTSeq and 16502 from Stringtie.(Note : In stringtie, I have used PrepDE.py for gene read counts conversion from FPKM values). Why stringtie gave more genes ? Is Stringtie results are genes or transcripts?
I have used ensembl GRCh38 fasta and gtf files.
Personally I feel that removing all with <99 is a bit stringent but there is no rule of thumb for this.
Can you post the exact cmdlines you're executing so we can see if there is something different in those?
Here is the commands :
cmds look OK at first sight.
Did you tried 'evaluating' the mode used by htseq? this can have influences on the end result and might be different in the software you applied
Stringtie
is performing reference-based transcriptome assembly, so most probably you also get counts for newly assembled gene/transcripts. You can usegffcompare
to compare the differences between the original gtf file and the one produced byStringtie
to see how many new transcripts it has assembled and counted.