I tried to get the information about read counts from the BAM files using both cuffdiff (isoforms.count_tracking file) and featurecounts using the same BAM files and gtf file and for several transcripts they show completely different results. Is it normal or something is going wrong?
I tried to run featurecounts with different parameters: with or without duplicates, stranded and non stranded but the result still doesn't match cuffdiff counts.
From Cuffnorm manual (probably applies to Cuffdiff as well):
Cuffnorm will report both FPKM values and normalized, estimates for
the number of fragments that originate from each gene, transcript, TSS
group, and CDS group. Note that because these counts are already
normalized to account for differences in library size, they should not
be used with downstream differential expression tools that require raw
counts as input.
Yes, I saw it but can it be so different that one transcript has 0 counts in featurecounts and several thousands in cuffdiff counts? And then what to use cuffdiff counts for? Clearly you cannot input them in edgeR or DESeq right?
Cuffdiff tries to estimates the counts and resolves ambiguity when the reads overlaps different transcripts of the same gene. Featurecounts might simply discard all the reads that have an ambiguous overlaps with different transcripts/genes. If you use the '-O' option in featureCounts, you would see some counts.
Do you have IGV screen shot of the transcript of interest and how the reads map there ? That would clearly show why there are differences. Use 'expanded' section for annotations and 'squished' and 'view as pairs' for the reads if you plan to post the IGV screenshot.
Yes, I saw it but can it be so different that one transcript has 0 counts in featurecounts and several thousands in cuffdiff counts? And then what to use cuffdiff counts for? Clearly you cannot input them in edgeR or DESeq right?
Not sure why the difference is so high.
Cuffdiff counts could be used as an alternative to FPKMs, but not normalized to gene length.