Entering edit mode
8.6 years ago
Emma
▴
10
I am trying to use msconvert to convert .raw files to mzML. The raw files are ~1.5 gb and seem to be too big, in the msconvert only goes so far through the the file and stops - my output mzML are not complete. I found this out when I tried to use Tandem and got 'syntax error parsing XML' back.
Is there a way to split the raw file before, or during, msconvert to mzml so I get multiple smaller output files that will work?
thank you
I thought about the filtering options (and tried a bit) but wasn't sure because I want to keep all my data. Are you suggesting I run it a few times each time filtering for a subset of the data and specifying different output files for each?
Yes, this is what I was thinking. You should then be able to concatenate the individual files. I'm not sure how easy/feasible this is as I don't have much experience with .raw data.
If the concatenated file gets too large, you can search the "decomposed" files separately, as they should be schematically valid mzML files and X!Tandem scoring doesn't care about other spectra. You can then merge the results (the smaller XML/pepXML/mzIdentML files) before proceeding with statistical validation etc. At least as long as you only care about peptide IDs and do spectral counting.
I have the same problem with ~2 Gb/80,000 spectra .raw files. With the latest msconvert, I get the complete mzML file without error or warning messages, but X!Tandem and COMET still cannot use the file... I will post a solution/workaround here if I find one.
Thank you. This is what I ended up doing and my final protein list looks as I expected. I had a problem concatenating as each smaller file had a beginning and end section - opening the files in a text editor and removing these blocks of text and putting the right bits at the end of the file was going to be a lot of work and error prone.