Hi
I want to compare the number of reads of each gene between output from star (with quantmode) and stringtie (combined with prepDe.py file). The transcripts and exons are possibly overlapped so i only compare the number of reads in single exon, single transcript gene between stringtie and star. The total read of star is four times higher than stringtie, the correlation is 0,95, and there are few genes with only a few reads in star but many reads in stringtie. Does anyone have any idea how to explain that? I thought the number of reads should be the same or at least the correlation should be 1.
Thanks in advance.
These comparisons take time and effort. Unless you really need to know you could just use STAR and then featureCounts like the rest of the world and call it a day, focusing on the actual analysis.
Thanks for your reply. After a long time of digging into that problem, I couldn't explain it to myself clearly. However, one assumption could be that Stringtie estimates the coverage level of the transcript by solving a maximum-flow problem that determines the maximum number of fragments that can be associated with the chosen transcript. That impacts the number of reads and coverage for each region. On the other side, Star only counts the reads, which are within the region. https://www.nature.com/articles/nbt.3122#Sec2:~:text=Second%2C%20StringTie%20estimates,in%20the%20ASG.
If you want transcript level estimations then simply use salmon or kallisto.