Entering edit mode
9.3 years ago
guisa.santos
•
0
Dear all,
Does anybody know how to calculate the total number of reads "used" by each transcript class code (=, c, j, i, etc) from cuffcompare?
Thank you in advance.
Belisa
Thank you Michael. This is exactly what I have done (also using awk :) So now I know that my reasoning is correct.
(?) My issue with this approach is that the sum of the reads estimated this way add-up to more than the total number of mapped reads... So I do do not know how to discuss this difference: my total number of aligned pair reads is 1,259,591,756 and the sum of all estimated reads (from all class codes from all tmap files) is 3,732,338,824.4 - these values are quite different). Do you have any clue why this is?
Argh; I think you have to divide by the read length / alignment length.
Yes, i know, these values are already divided by the read length (in my case 101 bp)...
You can use htseq-count with the combined gtf as reference counting the uniquely mapping reads per class_code:
Great suggestion, thanks! I will give it a try (and then post the results :)