For the genefu package to do the PAM50 prediction from RNA-seq data, my question es, should I use raw data (read counts) or should I transform my data prior the prediction?
Values that have been normalized and transformed for between sample comparison are appropriate.
I use the expression data that are median ratio normalized and variance stabilizing transformed (functions in DESeq2) from the read counts to do the PAM50 classification, then I compared the results predicted from TPM values, they are quite similar while there are still some samples (< 10) assigned to a different class.
I use the expression data that are median ratio normalized and variance stabilizing transformed (functions in DESeq2) from the read counts to do the PAM50 classification
Hi. For running PAM50 analyses, do you 1) normalize your entire gene expression matrix or rather 2) select only the subset of genes that PAM50 uses for classification and then do the normalization?
Hi. For running PAM50 analyses, do you 1) normalize your entire gene expression matrix or rather 2) select only the subset of genes that PAM50 uses for classification and then do the normalization?
I normalized the entire expression matrix first then used those 50 genes for doing PAM50
But why do you normalize? wouldn't each sample be treated independently to classify its pam50 status?