Question

RNA-seq quantification Stringtie, featurecounts and HTSeq - am I correct?

2

Entering edit mode

6.2 years ago

k.kathirvel93 ▴ 310

Hi EveryOne, I am new to RNA-seq data analysis, now am trying to compare different quantifiers, Stringtie, featurecounts and HTSeq . I have some questions, i am really happy if someone helps me.

Questions:

I have removed the genes which are have <99 read counts. Is that ok or should i go for 9?
When i have removed <99 read counts genes i got 11802 genes from featurecounts , 11305 from HTSeq and 16502 from Stringtie.(Note : In stringtie, I have used PrepDE.py for gene read counts conversion from FPKM values). Why stringtie gave more genes ? Is Stringtie results are genes or transcripts?

I have used ensembl GRCh38 fasta and gtf files.

RNA-Seq rna-seq gene next-gen genome • 6.1k views

ADD COMMENT • link 6.2 years ago by k.kathirvel93 ▴ 310

1

Entering edit mode

Personally I feel that removing all with <99 is a bit stringent but there is no rule of thumb for this.

Can you post the exact cmdlines you're executing so we can see if there is something different in those?

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

Here is the commands :

featureCounts -T 16 -p -g gene_name -a /home/kathirvel/Homo_sapiens.GRCh38.77.gtf -o /home/kathirvel/FeatureCounts/MAQC_Counts.csv /home/kathirvel/out.bam

htseq-count -i gene_name -m intersection-nonempty -f bam /home/kathirvel/out.bam /home/kathirvel/Homo_sapiens.GRCh38.77.gtf > /home/kathirvel/Counts.csv

stringtie -p 16 -e -G /home/kathirvel/Homo_sapiens.GRCh38.77.gtf -B -o /home/kathirvel/MAQC_ILM_BGI_A1_1_transcripts.gtf -A /home/kathirvel/gene_abundances.csv /home/kathirvel/out.bam

ADD REPLY • link 6.2 years ago by k.kathirvel93 ▴ 310

0

Entering edit mode

cmds look OK at first sight.

Did you tried 'evaluating' the mode used by htseq? this can have influences on the end result and might be different in the software you applied

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

1

Entering edit mode

Stringtie is performing reference-based transcriptome assembly, so most probably you also get counts for newly assembled gene/transcripts. You can use gffcompare to compare the differences between the original gtf file and the one produced by Stringtie to see how many new transcripts it has assembled and counted.

ADD REPLY • link 6.2 years ago by grant.hovhannisyan ★ 2.6k