I am trying to using GTEx RNA-seq V8 data to analyze the gene expression difference between human subcutaneous and visceral adipose tissue by DESeq2 and WGCNA. I am now considering the feasibility of the plan after browsing the GTEx website.
In the FAQs pages of GTEx website, I found the expression:
It is not possible to directly compare these values across tissues. Unfortunately, no computational method can be applied for proper between-sample normalization across the diverse set of tissues represented in GTEx. For further details, please see this paper: https://doi.org/10.1016/j.cell.2012.10.012. The gene expression visualizations therefore provide a qualitative measure of relative expression.
Can I ask if this means that I need to use the gene read data rather than the TPM data to do the analyze, or I can NOT compare the gene expression between these tissues by any of the GTEx RNA-seq data?
Firstly, you should always use counts rather than TPM when you are doing DE analysis. I would extend that to say that properly normalised counts should generally be used when you are concerned with comparing the levels of a given gene between diffefrent conditions (rather than the levels of the different genes in the same condition).
There are two problems with comparing genes cross tissue.
Most analyses assume that when I compare gene A in sample 1 to gene A in sample 2, that gene A is the same thing in both cases. This is mostly (although not always) true when comparing, say mutant vs wildtype of the same tissue, or treated vs untreated, or disease vs normal. However, when we compare accross tissues, it is likely that the transcript structure of what we are calling "gene A" is different in the two tissues, and changes in read count might be due to these differences, rather than differences in transcript abundance. Some fo this can be controlled for computationally. For example, by correcting for differing effective gene lengths using a combination of either salmon and RSEM and tximport to pass gene specific normalisation factors to DESeq. Salmon is also able to correct for gc biases. However, not having access to the GTEx raw data, you have to rely in the analysis tools they have used. I think they used RSEM, so you will have effective gene lengths, but not GC bias correction.
RNAseq is inherently compositional. 10 reads/million in one sample only corresponds to the same number of transcripts as 10 read/million in another transcript if none of the other transcripts in the same change expression level. If every gene in sample 2 had twice the expression level of the same gene in sample 1, then the number of reads recorded for that gene would be the same in samples 1 and 2: the composition hasn't changed, even if the absolute levels have. Worse, if a very highly expressed gene changes in expression, that changes the number of reads available for other transcripts - if your mostly highly exprssed gene is upregulated then all other genes will appear to be down regulated. In closely related samples we sovle this by either excluding the most highly expressed genes from normalisation, or by assuming the average fold change between conditions is 0. But we could also obtain the same problem by changing the expression a large number of less highly expressed genes, which these normalisations wouldn't address.
This doesn't stop people doing DE between different tissues: despite these theoretical problems it is done often. The extent to which the conclusions derived from such comparisons bare out in subsequent tests is not really addressed. My attitutde would be that such comparisons are fine for generating hypotheses, but that such hypotheses require testing with orthogonal approaches to be believed.
For WCGNA, I might derived independent networks for each tissue, and then address how those networks change between tissues, rather than buliding a single network from both tissues.