Hello!
Could someone kindly explain to me how to select the Third Quartile (Q3) gene expression values? I'm processing RNAseq data from TCGA studies. For the purpose of further gene cauterization I first performed log2 normalization, than median centering, after which I obviously got a lot of negative values. I can't understand how to perform Q3 by columns, should I calculate Q3 on all data or separately for each column? If the second is true-for one column gene might be above Q3 and for others no. Aren't we loosing the important difference between samples like that? And finally aren't we making data skewed by choosing only values that are >Q3? Sorry, I am a beginner in RNAseq data analysis. Thank you very much in advance for your help!
What is gene cauterization and why this non-standard approach rather than just using established normalization methods sich as the ones from DESeq2 or edgeR? Do you mean upper-quartile normalization?
I read that it is a common approach to use ConsensusClusterPlus. I am trying to figure out, are TCGA data_mrna_seq_v2_rsem.txt data already normalized to a fixed upper quaritile value of 1000 for gene and 300 for transcript level estimates or no?