Entering edit mode
2.6 years ago
Yang Shi
•
0
Dear Communities,
The survival and correlation analyses were usually conducted based on Xena RSEM TPM/FPKM. But there is a thread (Normalisation of RNAseq data from UCSC Xena Browser) indicating that these files normalised for library size but there is no cross sample normalisation. Could I use those kinds of data for survival and correlation analyses, even differential gene expression analysis using wilcox.test ? Thanks in advance! Kevin Blighe
FYI, there's a Xena file that does the DESeq2-type normalization on the RSEM counts:
https://xenabrowser.net/datapages/?dataset=TCGA-GTEx-TARGET-gene-exp-counts.deseq2-normalized.log2&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443
Hi Delaney,
Thanks for your information. Could I conduct DEG analysis based on expected_count also?(https://xenabrowser.net/datapages/?dataset=TcgaTargetGtex_gene_expected_count&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443)
Furthermore, I wonder the RSEM FPKM/TPM data could be used for DEG analysis by wilcox.test. Thanks in advance!
The file you linked to is actually log2(expected_count+1) -- but you can convert it back into expected_count and plug it into a differential gene expression program like limma or DESeq2.
I'd recommend doing the wilcox.test on the DESeq2-normalized counts (what I linked to) rather than TPMs (TPMs aren't great when you're doing statistical analysis to compare different samples).
I usually take the HTSeq data, convert it back to raw count, and then import to DESeq2, as per the info in my other thread.
Using FPKM/TPM with Wilcox test is 'okay', I suppose. The results would be slightly biased, as there is no adjustment for library size differences across samples.
Hi Kevin! Your thread have been learned. And I noticed that you mentioned "Data from the same sample but from different vials/portions/analytes/aliquotes is averaged" in HTSeq data. But the purpose of mine is to do DEG analysis based on normal and tumor tissues because of the limited normal samples of TCGA. What confused me is that those kind of data didn't mention about cross sample normalisation. So I'm not sure whether those processed like HTSeq data which you chose. https://xenabrowser.net/datapages/?dataset=TCGA-GTEx-TARGET-gene-exp-counts.deseq2-normalized.log2&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443
Could I do DEGs analysis based on either RSEM expected_count or RSEM expected_count (DESeq2 standardized)? Actually, I'm not sure what the difference between those.
If you're using DESeq2 or edgeR or some differential expression package to get your p-values, you should use RSEM expected_count (i.e. raw counts that are NOT log-transformed and are NOT DESeq2-standardized); differential expression packages will automatically do the cross-sample normalization for you.
If you're doing wilcoxon test, then use the RSEM expected_count (DESeq2 standardized).
Finally, note that RSEM is not HTSeq (RSEM is more accurate than HTSeq because of the way it handles multimapping) but you can use the RSEM counts just like you would use HTSeq counts.
Got that, many thanks!