Hello,
I have transcript counts from RNA-seq data. There are three samples, and they are biological replicates of a cell line. My goal is to provide a ranked list of expressed genes, with some sort of expression quantifier for each gene/transcript.
I am wondering the best way to normalize the data - just calculate RPKM values and then remove outliers per transcript? Or should I perform some sort of upper quantile normalization? If so, what is the best way to do this?
Thank you for your help!
I am probably stating the obvious, but RNA-seq is not a measure of absolute expression. It is closer to absolute expression than microarrays (probably), but comparisons between genes (ranking by expression) should be taken with multiple, large grains of salt.
RPKM between replicates should be fine, but if you want to optimize check: Optimal Scaling of Digital Transcriptomes