Sorry for any mistakes. English is not my native language
I am trying to create a machine learning model that takes TCGA proteome profiling data as its input.
I hope someone could help me understand meaning of columns and values of downloaded TCGA proteome matrixes.
I downloaded the TCGA proteome matrixes. with following options. (package TCGAbiolinks was used in R environement)
library(TCGAbiolinks)
query_protein<- GDCquery(project="TCGA-BRCA",
data.category="Proteome Profiling",
data.type="Protein Expression Quantification",
experimental.strategy='Reverse Phase Protein Array'
)
GDCdownload(query_protein)
Then, I opened one of downloaded TSV files.
The column in question is "protein_expression" column.
- Why are there some negative values in the column?
- Does it mean that some sort of normalization/standardization measures were used?
- If the said measures were used, are there any detailed explanations of the used measures on the internet?
- If normalization/standardization measures were not used, are there any R packages that can normalize/standardize this data? The data should be normalized before using them as inputs for my machine learning model.
There is a lot of usefule tips and resources about TCGA transcriptome data on the internet. However, i was unable to find good tips about analyzing TCGA proteome data. Your help would be greatly appreciated!!