Question

RNA seq data analysis and building co expression network

0

Entering edit mode

4.5 years ago

siu ▴ 160

Hi all, I have some questions regarding RNA seq analysis if you can suggest anything it will help me a lot.

I am currently normalizing RNA seq data for comparing genes expression within and between samples. Which normalization method would you recommend for this type of analysis? FPKM, TPM, TMM? Also I want to make a heatmap to see genes expressed in different conditions. Do you think transforming normalized data (like log2, z-score) is a good idea for this?
Also I want to build a co expression network, I am just wondering if normalization like FPKM, TPM, TMM has any influence on building a coexpression network?
Another thing I want to do is use Pearson correlation but I am confused if it will only find linear relationships among the normally distributed data. But normalization methods do not assume that counts to be normally distributed. So, Is it a bad idea to find pairwise coexpression of genes using pearson correlation? If so, which method would you recommend is reliable for building coexpression networks with which normalization method?

Please help me with this, I will be very grateful to you.

Thanks in advance

rna-seq RNA-Seq R • 2.3k views

ADD COMMENT • link updated 4.5 years ago by Nicolas Rosewick 11k • written 4.5 years ago by siu ▴ 160

0

Entering edit mode

For your first question - I recommend you to check edgeR or deseq2

The second question - you can use WGCNA R package for co-expression network analysis. you have to use normalized data like log2(counts+1) or FPKM, logCPM data as input.

ADD REPLY • link 4.5 years ago by Vasu ▴ 790

0

Entering edit mode

Thanks I will take a look at edgeR and deseq2. I have 16 samples, Is it a good idea to use WGCNA for my analysis? because author recommended more than 20 samples for WGCNA.

ADD REPLY • link 4.5 years ago by siu ▴ 160

score 0 · Answer 1 · 2020-06-09

Section 4 : https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html

Can WGCNA be used to analyze RNA-Seq data? Yes. As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data.

We suggest removing features whose counts are consistently low (for example, removing all features that have a count of less than say 10 in more than 90% of the samples) because such low-expressed features tend to reflect noise and correlations based on counts that are mostly zero aren't really meaningful. The actual thresholds should be based on experimental design, sequencing depth and sample counts.

We then recommend a variance-stabilizing transformation. For example, package DESeq2 implements the function varianceStabilizingTransformation which we have found useful, but one could also start with normalized counts (or RPKM/FPKM data) and log-transform them using log2(x+1). For highly expressed features, the differences between full variance stabilization and a simple log transformation are small.

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

If data come from different batches, we recommend to check for batch effects and, if needed, adjust for them. We use ComBat for batch effect removal but other methods should also work.

Finally, we usually check quantile scatterplots to make sure there are no systematic shifts between samples; if sample quantiles show correlations (which they usually do), quantile normalization can be used to remove this effect.