Entering edit mode
9.2 years ago
CHANG
▴
40
- In this post, it says TCGA RNAseqV2 rsem.genes.normalized_results are calculated by "For gene level estimates you divide all "raw_count" values by the 75th percentile of the column (after removing zeros) and multiply that by 1000." What are the reasons for multiplying by 1000?
- To avoid problem with zero counts during log2 transformation, typically people +1 to read count. Is this done before upper quartile normalization step? I am thinking if we add 1 after normalization, it wouldn't make sense as some normalized read counts can be really small (i.e. 0.0001), therefore a
log2(0.0001)
versuslog2(1.0001)
would be a huge difference.
Or Do people typically add 1 to just the (normalized) counts that are 0 before log2 transformation?