RNA seq data analysis and building co expression network
1
0
Entering edit mode
4.5 years ago
siu ▴ 160

Hi all, I have some questions regarding RNA seq analysis if you can suggest anything it will help me a lot.

  1. I am currently normalizing RNA seq data for comparing genes expression within and between samples. Which normalization method would you recommend for this type of analysis? FPKM, TPM, TMM? Also I want to make a heatmap to see genes expressed in different conditions. Do you think transforming normalized data (like log2, z-score) is a good idea for this?

  2. Also I want to build a co expression network, I am just wondering if normalization like FPKM, TPM, TMM has any influence on building a coexpression network?

  3. Another thing I want to do is use Pearson correlation but I am confused if it will only find linear relationships among the normally distributed data. But normalization methods do not assume that counts to be normally distributed. So, Is it a bad idea to find pairwise coexpression of genes using pearson correlation? If so, which method would you recommend is reliable for building coexpression networks with which normalization method?

Please help me with this, I will be very grateful to you.

Thanks in advance

rna-seq RNA-Seq R • 2.3k views
ADD COMMENT
0
Entering edit mode

For your first question - I recommend you to check edgeR or deseq2

The second question - you can use WGCNA R package for co-expression network analysis. you have to use normalized data like log2(counts+1) or FPKM, logCPM data as input.

ADD REPLY
0
Entering edit mode

Thanks I will take a look at edgeR and deseq2. I have 16 samples, Is it a good idea to use WGCNA for my analysis? because author recommended more than 20 samples for WGCNA.

ADD REPLY
0
Entering edit mode
4.5 years ago

Section 4 : https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html

Can WGCNA be used to analyze RNA-Seq data? Yes. As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data.

We suggest removing features whose counts are consistently low (for example, removing all features that have a count of less than say 10 in more than 90% of the samples) because such low-expressed features tend to reflect noise and correlations based on counts that are mostly zero aren't really meaningful. The actual thresholds should be based on experimental design, sequencing depth and sample counts.

We then recommend a variance-stabilizing transformation. For example, package DESeq2 implements the function varianceStabilizingTransformation which we have found useful, but one could also start with normalized counts (or RPKM/FPKM data) and log-transform them using log2(x+1). For highly expressed features, the differences between full variance stabilization and a simple log transformation are small.

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

If data come from different batches, we recommend to check for batch effects and, if needed, adjust for them. We use ComBat for batch effect removal but other methods should also work.

Finally, we usually check quantile scatterplots to make sure there are no systematic shifts between samples; if sample quantiles show correlations (which they usually do), quantile normalization can be used to remove this effect.

ADD COMMENT
0
Entering edit mode

removing features whose counts are consistently low (for example, removing all features that have a count of less than say 10 in more than 90% of the samples)

Can you explain how to remove less count genes in 90% samples in excel???

ADD REPLY
0
Entering edit mode

please do not use excel to do bioinformatics (or to handle any kind of genomic data). Please try appropriate tools/language such as R to manipulate gene count tables.

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6