No differentially expressed genes
0
0
Entering edit mode
3.0 years ago
bart ▴ 50

Hi,

I have a dataset now with 36k lncRNAs and I'm using DESeq2 to find differentially expressed lncRNAs between a healthy group and a disease group, but unfortunately I cannot find any DE lncRNAs with low padj values. However, when I explore my data by taking the log2 fold change of the mean gene counts of the disease group subjects and divide this by the mean gene counts of the healthy group, the log2 fold change is >2 or <-2 for some genes, which to me would imply that some genes should be differentially expressed. Also, what is strange to me, is that the samples from the same group do not cluster together well.

To make things clear I have written the formula for the FC below: log2(mean gene counts for a certain gene in disease group) / mean gene counts for a certain gene in healthy group). Also, I have added a picture of a histogram with the fold changes and a PCA plot.

Lastly, this is the code that I used for DESeq2:

dds<-DESeqDataSetFromMatrix(df,colData = 
metadata,design = ~design)
dds<-estimateSizeFactors(dds)
ddsnormalized<-counts(dds,normalized=TRUE)
vsdlncrna<-vst(dds,blind=TRUE)
plotPCA(vsdlncrna,intgroup="design")
dds<-DESeqDataSetFromMatrix(df,colData = metadata,design = ~design)
dds<-DESeq(dds)
results(dds,contrast = c("design","cancer","healthy"))
lncrnares<-results(dds,contrast = c("design","cancer","healthy"),alpha = 0.05)

Does anyone know if I made a mistake here? Thanks in advance!

fold change picture with frequency of genes with certain fold changes

PCA plot of cancer and healthy group

DESeq2 lncRNAs • 1.0k views
ADD COMMENT
2
Entering edit mode

Im not expert in this but from what I know, log2fc being high or low does not always imply that the gene is differentially expressed. It can be that just 1 sample was an outlier and it drags the mean of the whole group up or down. This will give you a big or small log2fc but padj may not be <0.05 or whatever your cutoff is because the variance is big. Also, looking at the PCA plot, your samples do not seem to cluster according to healthy and disease groups and so it may really be so that there isn't much differentially expressed lncRNAs in your dataset.

One suggestion would be to look at the genes whose mean log2fc is > 2 or < -2 (the method you are using) and see if any of the samples are outliers. Or you could use a different statistical test other than the default Wald test in DESeq2. I personally never used anything other than the default but others may be of more help.

Cheers!

ADD REPLY
0
Entering edit mode

Hi thanks for your input. I agree that one does not imply the other but I thought that it would be more likely.

Could you elaborate on your first suggestion? How could I use the fold changes to see which samples are outliers? I will also try your other suggestion

ADD REPLY
1
Entering edit mode

Oh so I just meant that you could take a gene which looks differentially expressed according to the ratio of means., say for example log2(newmymeans of gene X) = +2.5. Now, this looks differentially expressed. So simply look at the individual read counts of the samples for this gene and see whether any of them are outliers. If you find an outlier, that is why DESeq2 determined there is too much variance and so gave it a high padj value.

Does this help? Cheers!

ADD REPLY

Login before adding your answer.

Traffic: 1903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6