Entering edit mode
7.3 years ago
Amos
•
0
Hi.
I have a large database of RNA-seqs of different cell lines. I'd like to get normalized gene expression levels for each of the cell lines. From what I've read so far it seems that normalizing gene expression is done compared to a reference or control (such as with DEseq). Since the cell lines are indepedent of each other I don't have one cell line that I would consider a control or a suitable refenrce point for gene expression levels.
Is there an accepted way for normlaizing gene expression levels for single samples without comparing to a control?
Thank you.
I would say it depends on what is your biological question (e.g. you want to use all of them and compare them or pick some then compare?). The most obvious normalization that you can do as a first step is to normalize for the library size, i.e. do the cpm normalization.
I will want to pick some and compare them to each other later on but essentialy I want to compare the expression levels of different genes within each cell line to another metric that I calculated for each cell line.
What about Transcripts per million? (TPM), or RPKM? (Reads per kilobase per million)?
Well, normalization can be within a sample or across samples. In that case, I think you wanna do cpm, TPM, or RPKM/FPKM. Later when you want to compare some sets of cell lines, you probably want to do the across samples normalization (e.g. RLE, etc.).