How to analyze the RNAseq paired samples with DESeq2?
2
0
Entering edit mode
4.9 years ago
Kenneth • 0

I am a graduate student who works with the RNAseq data for the first time. I have been stuck on DESeq2 for quite a while so it would be great to have some feedback on my analysis! My question is: how to find the condition effect while controlling the participant effect?

This is what we have done:

  • Our study delivers 3 different conditions and 1 control treatment for each participant (13 total).
  • We sequenced the RNA from the samples
  • FastQC indicates a very good quality of the fastq data
  • I then used Salmon to align and assembly against the Ensembl transcriptome 98
  • I read the DESeq2 vignette and this tutorial, and I am able to reproduce the results
  • Now I attempted to run DESeq2 on my data to find the differential expressed genes
  • I used dds <- DESeqDataSetFromTximport(txi, colData = coldata, design = ~participant+condition) to create the dds object. Then I simply ran dds <- DESeq(dds) and res <- results(dds)
  • sum(res$padj < 0.1, na.rm=TRUE) return 4
  • sum(res$pvalue < 0.1, na.rm=TRUE) return 1448
  • I tried to use 'contrast' for other conditions comparison but they are very similar to this example

This is the PCA plot of the paired data:

PCA plot of the paired data

This is the results output and p-value histogram: enter image description here enter image description here

From plotDispEsts(dds)

enter image description here

Summary of the results

enter image description here

The four genes from above enter image description here

RNA-Seq R • 5.3k views
ADD COMMENT
1
Entering edit mode

Did you have one replicate for each condition? Do all the participants have the same sex? If two or three conditions are similar I would combine them and test against the control to increase power.

ADD REPLY
0
Entering edit mode

I have 13 different participants for each condition and they are 50/50 male and female

ADD REPLY
1
Entering edit mode

So what is your question exactly? "I don't know what to do now" is very vague; is there a specific problem you're stuck with?

ADD REPLY
0
Entering edit mode

I want to learn how to find the differentially expressed genes in this case, but I don't know how to proceed now!

ADD REPLY
1
Entering edit mode

Time for a different approach: If I were you, I would start to generate some sample to sample plots to show that there is not much 'distance' between the samples, hence the poor results. You can't just say it didn't work, you have to show why. They would complement the PCA plot.

ADD REPLY
0
Entering edit mode

I tried the hcluster package to get some plots, and it showed that the samples cluster within each participant as expected. But they don't cluster based on the condition. Is there a way to generate a cluster plot base on condition while accounting the differences between participants? Thanks!

ADD REPLY
0
Entering edit mode

No. Clustering is unsupervised...

ADD REPLY
1
Entering edit mode

What is your actual question? You've not provided us with any summary of your results: how many genes were differentially expressed for a typical treatment-vs-control comparison?

ADD REPLY
0
Entering edit mode

Sorry I didn't make it clear. I updated the post with the results summary. My question is how can I find the differential expressed genes based on what I have? Thanks for your time!

ADD REPLY
0
Entering edit mode

You can use CSI NGS Portal for a full Differential Expression analysis using DESeq2.

ADD REPLY
0
Entering edit mode

If you aim to promote your tool, please make a Tool post. OPs question is not about this. I will therefore move it to comment. Please do not advertise your tool if it doesn't add to the question.

ADD REPLY
1
Entering edit mode
4.9 years ago

It would be easy to conclude from this analysis that there is no effect of your condition. However, I am slightly drawn by the fact your p-value histogram skews towards one. In the case that there was genuinely no effect of the treatment - that is all the null hypotheses were true, we'd expect to see a flat p-value histogram.

It also looks like you've got a coherent group of genes that have a very high variance compared to the others. which I guess could be doing something to the variance fit, and making the model conservative? What are your mapping rates? You can find this by looking in the log files that come out of salmon?

Also, what do your library sizes look like?

ADD COMMENT
0
Entering edit mode

Thanks for your kind help! Some other p-value histogram shows a left-skewed like this

enter image description here

My mapping rates are about 85~92% and the average library size is 49 million

ADD REPLY
0
Entering edit mode
4.9 years ago

At first glance at your PCA, I'd say your condition doesn't do anything.

ADD COMMENT

Login before adding your answer.

Traffic: 1828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6