Hi,
I'm trying to deal with mass spectroscopy data that I uploaded with MALDIquant libray. I have 384 spectra. In MALDIquant, spectra are stored in an object composed of a mass vector, an intensity vector and metaData. All spectra are in a list object. One problem to handle all spectra and for example to normalize them and align peaks is that all mass vectors are not equivalent (in length and in content).
To circumvent this, i'd like to reduced the data. Indeed, for the mass 1000 Da, for example, i have up to 12 isolated values :
spectra.bc[[1]]@mass[1:10]
[1] 1000.129 1000.201 1000.273 1000.346 1000.418 1000.490 1000.563 1000.635 1000.708 1000.780
The typical length of a spectra is around 96000 values. I think for my purpose that a precision of 1 Da would be sufficient, so I would like to do compute a matrix of 10 001 rows (values of mass from 1000 to 10000) and intensities of each samples/spectra in columns, with something like this :
(sub is a list of 16 spectra. Each spectra s has masses in s@mass and intensities in s@intensity)
m = matrix(nr=10001,nc=length(sub),byrow=F,
data=lapply(sub, function(s) {
sapply(1000:10000,function(x) {
median(s@intensity[which(round(s@mass)==x)])
})
}))
But even with this portion of code, it takes about two minutes for one spectra to compute a thousand values (i ran a test with sapply(1000:2000, ...)). So it's very time consuming.
Do you know if i could improve the script to do the job faster ?
Thanks a lot,
Julien
Here is the description of the spectrum object :
MassSpectrum Object extends AbstractMassObject with slots :
.cache: "environment", internal use only mass: "vector", mass or mass-to-charge ratio intensity: "vector", intensities for measured mass-to-charge ratios metaData: "list", some metadata to describe the spectrum
spectra is a list of MassSpectrum objects.
The problem comes from the fact that if you have a list of 10 object (spectra) and do :
lapply(spectra, function(x) {length(x@mass)})
you don't get the same length for each mas vector.
In fact, i would like to put the intensities of each sample into a column of a matrix, but for that, i have to define a similar list of rows = masses for every sample. As i don't care for such precision, i imagined i could reduce the data by taking only one value of intensity for all intensities of similar masses ie when "round(s@mass) == x". The idea is that at start i have a vector of length 96000, and at the end, i will have a vector of length 10001, with the first 10 values transformed into one median, the next 12 values transformed into a second median, etc ...
I hope it is clearer that way but i'm not sure !
I don't know much about mass spec data so bear with me. So you basically want to: for every mass value, get the median of the list of intensities?
Yes, kind of, in fact i want to reduce the length of the vector, and i want to have a vector of the same length for all samples. At the beginning, i have for spectra 1 a list of mass like : 1000.129, 1000.201, ... , 1345.134, 1345.245, ... and for example for spectra 2 I would have a list of different masses : 1000.198, 1000.267, ... But in fact, i don't care about such precision, so i would like to get only one value for all 1000's values, and only one for all 1345's values. Then, it would be possible to have a matrix with a unique list of masses (integers from 1000 to 10000) as row.names(m)
Is your data in tab delimited format? Can you show me a couple of lines of the data?
In fact, i was talking of data already loaded in R, because data are in a proprietary format from bucker daltonics, and the library allows for automatic input. I'll put the description of the object in an answer (comment to short)
It might be faster to output it to text file and use python or something to deal with it. Loading it into a matrix and then working with it in R tend to be more process intensive.
Yes, i thought of that, but to do that, i would have to export the spectra one by one to separate text files. But that's a lead, and maybe i can use the function that was written to import data into R to automatically create these text files. Thanks for the idea !