Question

Rpkm In Masked Sequences

0

Entering edit mode

12.7 years ago

Leandro Lima ▴ 970

Hello.

I'm confused about calculating RPKM when the target sequence is masked.

In this post, they discuss about the use of "N", in

rpkm=10^9*C/NL, where C is the reads number of the transcript, L is the length of the transcript and N is the total reads count of the sample.

http://seqanswers.com/forums/showthread.php?t=7630

And what about L? Does it change, when the sequence is masked?

Thank you.

rpkm • 2.4k views

ADD COMMENT • link 4.5 years ago by Leandro Lima ▴ 970

score 1 · Answer 1 · 2013-02-22

1

Entering edit mode

12.7 years ago

Damian Kao 16k

If you mapped the reads to the masked sequence, then I would use the masked length to calculate the RPKM.

The rationale is that you are effectively removing reads that would have mapped to that masked region, so you should also remove the length of the masked region when normalizing by length.

ADD COMMENT • link 12.7 years ago by Damian Kao 16k