Question

Differential Expression from Quantile-Norm Data

0

Entering edit mode

7.0 years ago

matthew.m.hernandez ▴ 30

Hi everyone,

I'm a newbie to analyzing RNAseq data and wanted to get input on how to proceed forward with data that I received from my PI. The goal of the experiment was to compare gene expression across blood cells from different donors all under the same condition. There are donors of a given phenotype (e.g., S1, S2...) and another phenotype (e.g., P1, P2...) I have been given two files: data that has read counts and data that has been quantile normalized. The files are organized as follows:

Read Count File

Gene      S1          S2          P1
B2M       174991      119507      166104
LYZ       69046       35013       24405
....

Quantile Normalized File

Gene      S1          S2          P1
B2M       8449.38     8449.38     2821.43
LYZ       5186.47     1476.66     850.11
....

I have been informed to assess differences between samples by using the quantile normalized values. However, if I want to compare the expression of B2M, for example, between different samples (e.g., S1 and P1), do I need to normalize the quantile normalized values to a housekeeping gene (e.g., GPI) and then compare or do I just compare the values 8449.38 to 2821.43?

Or alternatively, should I turn to the read count file to re-analyze?

Furthermore, we'd like to do a GSEA for between the two different phenotypes (e.g., S samples versus P samples). Any advice on how to combine the data for S donors and P donors to attempt this?

Any advice, insight or pointing to relevant questions on Biostars is extremely appreciated.

Thanks!

RNA-Seq differential-expression • 2.0k views

ADD COMMENT • link updated 7.0 years ago by Hussain Ather ▴ 990 • written 7.0 years ago by matthew.m.hernandez ▴ 30

0

Entering edit mode

Are you looking to find differentially expressed genes between S and P samples? Also for GSEA, are you interested in finding pathways that are activated in S vs P samples or vice versa? For GSEA you will have to rank you genes first.

ADD REPLY • link 7.0 years ago by Matina ▴ 250

0

Entering edit mode

That's exactly what we're trying to do (both for comparing differentially expressed genes and for GSEA).

ADD REPLY • link 7.0 years ago by matthew.m.hernandez ▴ 30

0

Entering edit mode

Ok, so have you tried using lets say EdgeR or DESeq2 with the raw read counts? If you would prefer to use the normalised data you could use a limma-voom solution. As for GSEA, after you determine differentially expressed genes between the groups of interest you can rank the genes based on some criteria e.g. FDR and run GSEA.

ADD REPLY • link 7.0 years ago by Matina ▴ 250

0

Entering edit mode

Thanks for the advice Matina. Maybe you could provide further input, though. From the little that I've seen, the tutorials for those pipelines are based on individual samples +/- some condition and comparing gene expression. However, in this case, if my samples are cells from let's say 6 distinct people, 3 who are of one condition and 3 who are of another condition. There's no same sample that exists in either condition as they are different donors. How would one compare gene expression through those pipelines, then?

(My apologies for any naivety with this question, by the way. And again, thanks for your help).

ADD REPLY • link 7.0 years ago by matthew.m.hernandez ▴ 30

score 0 · Answer 1 · 2018-05-04

0

Entering edit mode

7.0 years ago

Hussain Ather ▴ 990

You should normalize them to a housekeeping gene.

ADD COMMENT • link 7.0 years ago by Hussain Ather ▴ 990