Entering edit mode
2.2 years ago
LJM
•
0
I have downloaded a dataset with the proteomes of a number of cancer cell lines. There are a lot of missing values in this dataset. I am looking at imputation, and what to insert 0 only when the protein is not expressed in that cell line. I am working in Python, and had in mind a basic strategy where I could download a list of the proteins actually expressed in each cell line and use this to impute 0 values in specific instances.
Could someone advise on a way to do this? A specific Python library/API would be ideal.
isn't this exactly what you have done by downloading the proteomics data for these cells?
That would be assuming that all NA values in the spreadsheet are simply due to the cells not expressing that protein, so no.
The data is from the supplementary information of a paper which performed mass spectrometry based proteomic analysis of a large number of cell lines. I want to be able to confirm which proteins are not expressed in those cell lines, as many NA values could also be technical artifacts