Question

Very Large Fpkm Values For Some Transcripts: Artifacts?

5

Entering edit mode

13.3 years ago

Pfs ▴ 580

I use cufflinks to analyze a ~ 135 million reads experiments. The FPKM values vary from 0 to 100000. I did not use the -N option that sometimes can produce inflated FPKM, so I investigated some of the very large FPKM values.

The generally are associated with non-coding protein genes and have length approx 100. The number of alignments covering the regions are approx 3000.

Using the RPKm formula I cannot make sense of the large FPKM values.

Any explanation? Are these artifacts? If so, how can they be detected and filtered out?

fpkm rna • 6.2k views

ADD COMMENT • link updated 11.4 years ago by Biostar 20 • written 13.3 years ago by Pfs ▴ 580

0

Entering edit mode

Have you tried writing the authors (i.e. Cole)? Typically they are responsive. Of course, if they were to respond we'd love to see the answer.

ADD REPLY • link 13.3 years ago by seidel 11k

0

Entering edit mode

If you're generally trying to quantify the abundance of a number of short transcripts, you might also try passing cufflinks the --no-effective-length-correction flag.

ADD REPLY • link 11.4 years ago by Rob 7.1k

score 1 · Answer 1 · 2012-02-13

1

Entering edit mode

13.2 years ago

Obi Griffith 20k

I'm not sure if your FPKM values are being calculated correctly but from looking at a lot of RNA-seq data I would say it is normal to have some very highly expressed, short, non-coding transcripts (e.g., rRNA genes).

ADD COMMENT • link 13.2 years ago by Obi Griffith 20k

score 0 · Answer 2 · 2012-03-11

0

Entering edit mode

13.2 years ago

Mkd • 0

It also depends on the tissue. Some have extremely high levels of a few mRNAs such as lens and developing RBC.

ADD COMMENT • link 13.2 years ago by Mkd • 0

score 0 · Answer 3 · 2012-03-12

0

Entering edit mode

13.2 years ago

Mikael Huss 4.8k

This has been observed by many users. Read, for example, this SeqAnswers thread, where Cole Trapnell also makes an appearance.

ADD COMMENT • link 13.2 years ago by Mikael Huss 4.8k

score 0 · Answer 4 · 2012-03-12

It depends on the samples....as already said it might come from rRNA...or what we observed in one of our last sequencing runs was that the Globin genes where highly expressed cause it was an whole blood sample. About 90% of all reads belong to these few globin genes....[?] So think about your sample and what kind of genes could be very higly expressed!