Dear All,
I am following the TCGAbiolinks tutorial for conducting differential expression analysis on TCGA data ("TCGAanalyze: Analyze data from TCGA" section). I have 2 questions about it.
1) I don't understand the following: when dealing with legacy=TRUE
data (platform = "Illumina HiSeq", file.type = "results"
), they perform normalization to correct gene length (TCGAanalyze_Normalization
with default parameter); but when they are dealing with legacy=FALSE
data (workflow.type = "HTSeq - Counts"
), they perform normalization to correct GC content (TCGAanalyze_Normalization
with method = "gcContent"
). What is the reason for that ? Do you have any explanation ?
2) if I want to use the TCGAanalyze_DEA
function with pipeline=limma
, should I use the same normalization methods as for pipeline=edgeR
? otherwise, which one should I use for the legacy=FALSE
and legacy=TRUE
data, respectively ?
Hope you could help a bit. Thanks in advance !
Erica