Question

Calculate Number of Reads per class code (cufflinks)

0

Entering edit mode

9.3 years ago

guisa.santos • 0

Dear all,

Does anybody know how to calculate the total number of reads "used" by each transcript class code (=, c, j, i, etc) from cuffcompare?

Thank you in advance.

Belisa

cuffcompare • 1.9k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by guisa.santos • 0

Ram · Answer 1 · 2015-09-01

1

Entering edit mode

9.3 years ago

michael.ante ★ 3.9k

Hi Guisa,

If you need the number of reads I would avoid cufflinks. Anyway you can use awk's associative array to make an approximation:

awk 'NR>1{class[$3]+=($10*$11)}END{for(i in class){print i"\t"class[i]}}' cuffcompare.tmap

You add per line the average read-depth times the length to the value of each class.

Cheers,

Michael

ADD COMMENT • link 9.3 years ago by michael.ante ★ 3.9k

0

Entering edit mode

Thank you Michael. This is exactly what I have done (also using awk :) So now I know that my reasoning is correct.

(?) My issue with this approach is that the sum of the reads estimated this way add-up to more than the total number of mapped reads... So I do do not know how to discuss this difference: my total number of aligned pair reads is 1,259,591,756 and the sum of all estimated reads (from all class codes from all tmap files) is 3,732,338,824.4 - these values are quite different). Do you have any clue why this is?

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by guisa.santos • 0

0

Entering edit mode

Argh; I think you have to divide by the read length / alignment length.

ADD REPLY • link 9.3 years ago by michael.ante ★ 3.9k

0

Entering edit mode

Yes, i know, these values are already divided by the read length (in my case 101 bp)...

ADD REPLY • link 9.3 years ago by guisa.santos • 0

1

Entering edit mode

You can use htseq-count with the combined gtf as reference counting the uniquely mapping reads per class_code:

htseq-count ... -t class_code my.bam cufflinks.combined.gtf