soft threshold for co-expression analysis using WGCNA based on scale independence
0
0
Entering edit mode
5.6 years ago
newbie ▴ 130

Hi,

I have the raw counts for around 100 tumor samples. I'm interested in co-expression analysis of single lncRNA with all protein coding genes to check which protein coding genes are strongly correlated.

I have a dataframe df with all protein coding genes and one lncRNA with counts data.

dim(df)
## [1] 19803   100

head(df)[1:5,1:4]

                    sample1        sample2          sample3          sample4
A1BG                  14               59               11               31
A1CF                   0                4                1                0
A2M                 6509             7708             7306            16869
A2ML1                 64               71             1317             3406
A3GALT2                7               28                8                0


U3 <- as.matrix(df)

library(DESeq2)
vsd <- vst(U3, blind=FALSE)
oed <- vsd

gene.names=rownames(oed)
trans.oed=t(oed)
dim(trans.oed)

n=19803;
datExpr=trans.oed[,1:n]
dim(datExpr)

SubGeneNames=gene.names[1:n]

library(WGCNA)
options(stringsAsFactors = FALSE);
allowWGCNAThreads()

powers = c(c(1:10), seq(from = 12, to=20, by=2));
sft=pickSoftThreshold(datExpr,dataIsExpr = TRUE,
                      powerVector = powers,corFnc = cor,
                      corOptions = list(use = 'p'),networkType = "unsigned")

And the plot look like this

enter image description here

What is the reason for the soft threshold power like above in the plot? which power should I select?

RNA-Seq R wgcna co-expression • 2.7k views
ADD COMMENT
1
Entering edit mode

Please search in your search engine. There are many questions on the soft thresholding power, how to choose the best value, and what this threshold means.

ADD REPLY
0
Entering edit mode

Hi,

I have checked some post and tutorials. I found the answer I need. Have a small question.

I'm trying to do this co-expression network between some interested lncRNAs and protein coding genes. But after getting the modules in the analysis, I see that all my interested lncRNAs are in grey module which is basically module with unassigned genes.

But I'm very interested in looking pc genes coexpressed with my interested lncRNAs. What I have to do now?

ADD REPLY
0
Entering edit mode

Maybe your lncRNAs have low expression, which is why they are in that module. What was your input data to WGCNA? - normalised counts or normalised + transformed (e.g. logged, Z-transformed) counts?

ADD REPLY
0
Entering edit mode

I have the raw counts for around 100 tumor samples. I used data which is a matrix with 100 samples and 14k protein coding genes and my interested lncRNAs. I used variance stabilised transformation from DEseq2.

vsd <- vst(data, blind=FALSE)

Along with 100 tumor samples, I also have some 50 normal samples. For WGCNA I used only 100 tumor samples, because I wanted to know co-expressed genes specific to tumor condition.

ADD REPLY
0
Entering edit mode

May I know the answer please.

ADD REPLY
1
Entering edit mode

There is no further answer to give, really. Variance-stabilised counts should be okay for WGCNA. You could try rlog counts, instead, if you wished.

Going back a few steps, you should remove genes of low counts prior to normalisation in DESeq2. It seems strange that most of your lncRNAs are in the same module - the conclusion that I have is that most of them are originally of low expression, and perhaps should have been filtered out.

ADD REPLY
0
Entering edit mode

Hey Kevin,

As I don't find any co-expressed genes with WGCNA for my interested lncRNA, I tried using correlation analysis using Pearson method and filtered based on pvalue < 0.05.

The co-express genes need to be only positive co-expressed genes i.e. r > +0.5

or I should also use genes with negative values also for Pathway analysis?

ADD REPLY
1
Entering edit mode

The negative genes are equally as informative as the positive genes, no? - you can analyse them together in pathway analysis, or do 2 separate analyses for:

  1. positive genes
  2. negative genes

You should only include the correlations with p-value < 0.05

ADD REPLY
1
Entering edit mode

Sure. thanq for the reply

ADD REPLY

Login before adding your answer.

Traffic: 1084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6