hello biostars, i'm quite new to bioinformatics so i'm sorry if i'm going to ask something stupid. well, i'm stuck on a problem with RNASeq data manipulation. i have my samples and I want a quantity that measures their expression, without getting involved in normalisation pipelines provided by edgeR,deseq, since they depends on the samples that you are analysing. given that i can't normalise all my samples together because they belong to different treatments and conditions, i thought to use the FPKM provided by the the output of RSEM, the algorithm used to perform the first part of the analysis.
reading the paper "trascriptome analyses of the human retina identify unprecedented transcript diversity..." (Farkas 2013), I found this expression: "[...]Using the standard of 1–4 RPKM being equal to one transcript/cell, this suggests that we have detected between 1 to 2500 transcripts, at a minimum per cell [54]. Approximately 50% of all expressed transcripts fall within the 5–25 RPKM (5–25 transcripts/cell) range. "
I deduce that the assumption of 1-4 RPKM being equal to one transcript/cell derives from the mortazavi 2008 paper in which they introduced the concept of RPKM as a measure of the expression, but I read they had internal standards on which they could effectively measure the transcript levels and correlate them with RPKM.
given the assumption that I understood the difference between FPKM and RPKM (I hope so), I sincerely don't understand why Farkas and colleagues use RPKM since they perform the analysis whit Illumina HISeq 2000 instrument (that gives paired-end sequence reads so I presume they should use FPKM instead of RPKM), and given that I need a correlation between FPKM and number of transcripts per cell just like what they say in the paragraph i posted before, what I should do? consider my FPKM as an RPKM IFF the library size is the same for my experiments and that of Farkas or, better, the libraries from Mortazavi? is anywhere an assumption like "5 FPKM = 1 transcript/cell"?
thanks in advance!
fab
i would guess that if splicing is accounted for, then Fragments make more sense, but I am not sure. also, while normalization is highly desired, one can work with counts using Fischer exact test.
Agreed on the splicing consideration. I hesitate to suggest Fisher's test, since it's often not testing what people naively think ("naively" only since the average biologist doesn't have much of a grasp of data analysis).
true about "custom" Fischer testing, one must to be able to explain it.