Calculate Number of Reads per class code (cufflinks)
1
0
Entering edit mode
9.3 years ago

Dear all,

Does anybody know how to calculate the total number of reads "used" by each transcript class code (=, c, j, i, etc) from cuffcompare?

Thank you in advance.

Belisa

cuffcompare • 1.9k views
ADD COMMENT
1
Entering edit mode
9.3 years ago
michael.ante ★ 3.9k

Hi Guisa,

If you need the number of reads I would avoid cufflinks. Anyway you can use awk's associative array to make an approximation:

awk 'NR>1{class[$3]+=($10*$11)}END{for(i in class){print i"\t"class[i]}}' cuffcompare.tmap

You add per line the average read-depth times the length to the value of each class.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Thank you Michael. This is exactly what I have done (also using awk :) So now I know that my reasoning is correct.

(?) My issue with this approach is that the sum of the reads estimated this way add-up to more than the total number of mapped reads... So I do do not know how to discuss this difference: my total number of aligned pair reads is 1,259,591,756 and the sum of all estimated reads (from all class codes from all tmap files) is 3,732,338,824.4 - these values are quite different). Do you have any clue why this is?

ADD REPLY
0
Entering edit mode

Argh; I think you have to divide by the read length / alignment length.

ADD REPLY
0
Entering edit mode

Yes, i know, these values are already divided by the read length (in my case 101 bp)...

ADD REPLY
1
Entering edit mode

You can use htseq-count with the combined gtf as reference counting the uniquely mapping reads per class_code:

htseq-count ... -t class_code my.bam cufflinks.combined.gtf
ADD REPLY
0
Entering edit mode

Great suggestion, thanks! I will give it a try (and then post the results :)

ADD REPLY

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6