Hi everyone!
I'm trying to answer some questions about protein abundance in healthy/diseased human tissues using mass spec data online. I've got a pipeline planned but because I'm new to proteomic analysis I'm not sure if I am making any glaring errors.
As an example, say I am interested in comparing protein abundance between psoriatic skin and atherosclerotic plaques. I don't have the means to collect this data myself, so I go to PRIDE and use samples from the following datasets:
a) https://www.ebi.ac.uk/pride/archive/projects/PXD021673 (psoriasis)
b) https://www.ebi.ac.uk/pride/archive/projects/PXD035555 (atherosclerotic plaque)
Then, I do the following processing:
- I convert the .RAW files to .mzML (with peak-picking enabled)
- For each separate experiment, I use openMS to do feature detection
- For each separate experiment, I use openMS to do feature map retention time alignment
- For each separate experiment, I use openMS to do feature linking
- For each separate experiment, I use openMS to do an accurate mass search
- For each separate experiment, I do QC (imputation/filtering)
- I should now have intensities for each protein in each sample in each experiment
- For each protein, I do a Kruskal Wallis test. Group 1 consists of the psoriasis samples. Group 2 consists of the atherosclerotic plaque samples.
- Perform FDR and do a volcano plot to find enriched proteins
Does this seem sensible? Am I making any glaring errors?
My main hesitation relates to comparing data from two different experiments. I am also unsure if experiments need to have been performed with the same instrument
Thank you very much for your time - Aay references to exemplar papers that I could consult would be greatly appreciated if you know them.