Question

Is it valid to compare the expression of pre-defined candidate genes from RNA-seq data?

0

Entering edit mode

7.5 years ago

antoinefelden ▴ 60

Hi,

In addition to a classic differential expression analysis, I'd like to investigate the expression of pre-defined 'candidate genes' from my RNA-seq data.

What I've done: from a TMM-normalised transcript quantification matrix (the same kind of matrix that is leveraged by differential expression analysis by edgeR, voom, DESeq2 etc..), I pulled out the genes I was interested in and scaled/log-2 transformed the TMM counts to produce heatmaps that do show condition-specific expression patterns (but these didn't show up in the differential expression analysis). I frame it as an exploratory analysis, not as a definite differential expression analysis.

However, I can not find any published similar workflow. I tried various ways to google this question, and I could not find a single comment on such approach, which I find very surprising. Can anyone help by providing some sources or insights they have on this?

Thanks, Antoine

RNA-Seq • 1.8k views

ADD COMMENT • link 7.5 years ago by antoinefelden ▴ 60

score 0 · Answer 1 · 2017-11-25

Just because you observe differences in your candidate genes doesn’t mean they are DE genes. Typically a pre-defined p-value is used as a cut-off to define the DE genes. If your candidates aren’t at least less than or equal to a p-value of 0.5, in your next experiment you could assume a greater p-value like 0.8, since you currently are establishing an apriori. Anyhow, how did you come about your candidates? Have they been biologically shown to control the phenotype? Perhaps those are false positive candidates?

score 0 · Answer 2 · 2017-11-25

0

Entering edit mode

7.5 years ago

antoinefelden ▴ 60

To be specific, I'm trying to investigate if immune genes show condition-specific expression profiles. I first produced heatmaps with a set of immune genes that do show consistent differences between conditions across replicates. Then, I did a simple glm(scale(log2(TMMs))~condition) that showed significant estimates for condition. I understand that it can only be an exploratory and has to be taken with caution as there are likely false positives in there. But I'm also surprised not to find similar approaches anywhere I looked for. Is it that flawed to interrogate a TMM database from RNA-seq data with good sample replication?

ADD COMMENT • link 7.5 years ago by antoinefelden ▴ 60

0

Entering edit mode

A bit of background: I work on a non-model organism - the Argentine ant - that does have a sequenced genome but limited annotation, and no actual functional characterisation. Which is why I try to look at the big picture as much as I can, because not much is known in the genes that turned out to be differentially expressed after an actual DE analysis.

ADD REPLY • link 7.5 years ago by antoinefelden ▴ 60