I'm working on a project looking at protein complex stoichiometry in cancer, so I need proteomics data that lets me compare the abundance of different proteins to each other within each sample.
I was planning to use the CCLE data, but this preprint says that's normalised so that each protein is compared to itself in different samples and not comparable between proteins. I'm not sure I can use correlations because I have to do this on an individual-sample basis.
Does anyone know how I could use the CCLE data, or if there's another proteomics data source that might be suitable?
I think the general problem with comparing "things" (protein, gene, transcript, metabolite) within a sample is that these are not unbiased. For example in proteomics, one commonly starts with some kind of enzymatic digestion before loading the sample onto the mass spec. However, different proteins have different cleavage sites for, say trypsin, so you get vastly different types of peptides and this directly relates to mappability. Some peptides might be unique while others will be ambiguous or the digestion degrades the protein so much it cannot be MS-ed or mapped to the reference. Hence, comparing a protein that is very unique to one that got very much degraded during sample preparation might naively suggest that the abundance is very different -- but that can purely be technical, maybe in the cell the abundances were initially the same.
Same with RNA-seq, here we sequence and map transcripts, but GC and mappability bias make it hard to really make statements on how different transcripts compare per cell.
In contrast, between samples the above mentioned biases should be the same as in a pairwise fashion we always compare the same "thing" with itself.