Entering edit mode
5.6 years ago
newbie
▴
130
Hi,
I have the raw counts for around 100 tumor samples. I'm interested in co-expression analysis of single lncRNA with all protein coding genes to check which protein coding genes are strongly correlated.
I have a dataframe df
with all protein coding genes and one lncRNA with counts data.
dim(df)
## [1] 19803 100
head(df)[1:5,1:4]
sample1 sample2 sample3 sample4
A1BG 14 59 11 31
A1CF 0 4 1 0
A2M 6509 7708 7306 16869
A2ML1 64 71 1317 3406
A3GALT2 7 28 8 0
U3 <- as.matrix(df)
library(DESeq2)
vsd <- vst(U3, blind=FALSE)
oed <- vsd
gene.names=rownames(oed)
trans.oed=t(oed)
dim(trans.oed)
n=19803;
datExpr=trans.oed[,1:n]
dim(datExpr)
SubGeneNames=gene.names[1:n]
library(WGCNA)
options(stringsAsFactors = FALSE);
allowWGCNAThreads()
powers = c(c(1:10), seq(from = 12, to=20, by=2));
sft=pickSoftThreshold(datExpr,dataIsExpr = TRUE,
powerVector = powers,corFnc = cor,
corOptions = list(use = 'p'),networkType = "unsigned")
And the plot look like this
What is the reason for the soft threshold power like above in the plot? which power should I select?
Please search in your search engine. There are many questions on the soft thresholding power, how to choose the best value, and what this threshold means.
Hi,
I have checked some post and tutorials. I found the answer I need. Have a small question.
I'm trying to do this co-expression network between some interested lncRNAs and protein coding genes. But after getting the modules in the analysis, I see that all my interested lncRNAs are in grey module which is basically module with unassigned genes.
But I'm very interested in looking pc genes coexpressed with my interested lncRNAs. What I have to do now?
Maybe your lncRNAs have low expression, which is why they are in that module. What was your input data to WGCNA? - normalised counts or normalised + transformed (e.g. logged, Z-transformed) counts?
I have the raw counts for around 100 tumor samples. I used
data
which is a matrix with 100 samples and 14k protein coding genes and my interested lncRNAs. I used variance stabilised transformation fromDEseq2
.Along with 100 tumor samples, I also have some 50 normal samples. For WGCNA I used only 100 tumor samples, because I wanted to know co-expressed genes specific to tumor condition.
May I know the answer please.
There is no further answer to give, really. Variance-stabilised counts should be okay for WGCNA. You could try rlog counts, instead, if you wished.
Going back a few steps, you should remove genes of low counts prior to normalisation in DESeq2. It seems strange that most of your lncRNAs are in the same module - the conclusion that I have is that most of them are originally of low expression, and perhaps should have been filtered out.
Hey Kevin,
As I don't find any co-expressed genes with WGCNA for my interested lncRNA, I tried using correlation analysis using Pearson method and filtered based on pvalue < 0.05.
The co-express genes need to be only positive co-expressed genes i.e. r > +0.5
or I should also use genes with negative values also for Pathway analysis?
The negative genes are equally as informative as the positive genes, no? - you can analyse them together in pathway analysis, or do 2 separate analyses for:
You should only include the correlations with p-value < 0.05
Sure. thanq for the reply