I am a graduate student who works with the RNAseq data for the first time. I have been stuck on DESeq2 for quite a while so it would be great to have some feedback on my analysis! My question is: how to find the condition effect while controlling the participant effect?
This is what we have done:
- Our study delivers 3 different conditions and 1 control treatment for each participant (13 total).
- We sequenced the RNA from the samples
- FastQC indicates a very good quality of the fastq data
- I then used Salmon to align and assembly against the Ensembl transcriptome 98
- I read the DESeq2 vignette and this tutorial, and I am able to reproduce the results
- Now I attempted to run DESeq2 on my data to find the differential expressed genes
- I used
dds <- DESeqDataSetFromTximport(txi, colData = coldata, design = ~participant+condition)
to create the dds object. Then I simply randds <- DESeq(dds)
andres <- results(dds)
sum(res$padj < 0.1, na.rm=TRUE)
return 4sum(res$pvalue < 0.1, na.rm=TRUE)
return 1448- I tried to use 'contrast' for other conditions comparison but they are very similar to this example
This is the PCA plot of the paired data:
This is the results output and p-value histogram:
From plotDispEsts(dds)
Summary of the results
The four genes from above
Did you have one replicate for each condition? Do all the participants have the same sex? If two or three conditions are similar I would combine them and test against the control to increase power.
I have 13 different participants for each condition and they are 50/50 male and female
So what is your question exactly? "I don't know what to do now" is very vague; is there a specific problem you're stuck with?
I want to learn how to find the differentially expressed genes in this case, but I don't know how to proceed now!
Time for a different approach: If I were you, I would start to generate some sample to sample plots to show that there is not much 'distance' between the samples, hence the poor results. You can't just say it didn't work, you have to show why. They would complement the PCA plot.
I tried the hcluster package to get some plots, and it showed that the samples cluster within each participant as expected. But they don't cluster based on the condition. Is there a way to generate a cluster plot base on condition while accounting the differences between participants? Thanks!
No. Clustering is unsupervised...
What is your actual question? You've not provided us with any summary of your results: how many genes were differentially expressed for a typical treatment-vs-control comparison?
Sorry I didn't make it clear. I updated the post with the results summary. My question is how can I find the differential expressed genes based on what I have? Thanks for your time!
You can use CSI NGS Portal for a full Differential Expression analysis using DESeq2.
If you aim to promote your tool, please make a
Tool
post. OPs question is not about this. I will therefore move it to comment. Please do not advertise your tool if it doesn't add to the question.