This is a long post, detailing my observations with questions at the end.
I have (foolishly) volunteered to look at some proteomics data in a proprietary format. My aim is to convert it to mzXML, then generate "annotated PMF spectra" - that is, a plot of intensity v. m/z ratio where peaks are labelled with mass and perhaps peptide and positions in the protein.
The data are from a Voyager DE-STR MALDI-TOF instrument; the file suffix is ".dat". It seems from this information that there are few options. One is to install PyMsXML on a Windows machine which also has the proprietary Data Explorer software. PyMsXML appears not to have been updated since 2007. Another possibility might be the executables from ProteoWizard, again requiring Data Explorer.
This is tremendously painful for me, since I never use Windows. However, I do have a version of WinXP installed as a virtual machine using VirtualBox (Ubuntu). I have worked through the PyMsXML installation guide, with the following results:
1. Download and install ActivePython
The latest version from ActiveState is 2.7.0.2 (or for Python3, 3.1.2.4). However, the 2.7 version does not appear to include the "COM Makepy utility" referred to in the PyMsXML instructions. I downloaded and installed the earliest available free version, 2.5.5.7, which does include the utility.
2. Install Data Explorer
I have been sent a zip archive. Confusingly it is named "DataExplorer5.1.zip, but the actual version seems to be 4.0.0.0. Anyway, it seems to install and run OK.
3. Install COM library interfaces
The instructions are to open the COM Makepy utility and look for "ExploreDataObjects 1.0 Type Library (1.0)" and "IDAExplorer 1.0 Type Library (1.0)" - the latter is for .dat files. Neither of these exist. However, there is a library named "Data Explorer 4.2 Type Library (4.2)". The interface to this appears to install correctly.
The PyMsXML instructions then refer to a couple of tests to check installation. The test for Analyst files fails, but the one for Data Explorer appears to pass.
4. Download, install and edit the PyMsXML scripts
This step is fine. Next - run on a test file. I run:
pymsxml -R voyager -o myfile.mzXML myfile.dat
And I get the error:
Traceback (most recent call last):
File "C:\bin\pymsxml.py", line 1796, in <module>
x.write(debug=opts.debug)
File "C:\bin\pymsxml.py", line 83, in write
self.write_scans(tmpFile,debug)
File "C:\bin\pymsxml.py", line 300, in write_scans
for (s,d) in self.reader.spectra():
File "C:\bin\pymsxml.py", line 1528, in spectra
(tf,fixedMass) = doc.InstrumentSettings.GetSetting(self.delib.constants.dePr
eCursorIon,i-1,None)
AttributeError: class constants has no attribute 'dePreCursorIon'
I have much less to say about the ProteoWizard executables: they all fail to run with the message "The system cannot execute the specified program." I briefly attempted to build from source under Cygwin, but gave that up as a waste of time.
So my questions are:
- Has anyone got PyMsXML to run using ActivePython > 2.4 ?
- Any idea what the PyMsXML error message means ?
- Any tips at all for getting PyMsXML, ProteoWizard or any other tool to convert Voyager .dat files to mzXML ?
@neilfws: Enjoyed reading your blog on this question.
and congrats for the paper !