Official specs and API's for proteomics data (like Adam, VCF, HTS-JDK) (both file and distributed storage)
2
0
Entering edit mode
10.0 years ago
William ★ 5.3k

Is there an (semi)official specification and API for Proteomics data? Both for file based and distributed storage?

Like there is for Genomics data:

Specs:

File storage

http://samtools.github.io/hts-specs

Distributed storage

https://github.com/bigdatagenomics/adam

https://github.com/bigdatagenomics/bdg-formats

API:

File storage

https://github.com/samtools/htsjdk

Api for reading from distributed storage?

I remember there are formats like mzml , mzIdent and mzQuant from the HUPO Proteomics Standards Inititative. Have these taken of (being widely accepted used) as the standards for proteomics data? Is there also an API (like HTS-JDK) and a distributed storage variant (an Eva to Adam :) ) ?

genomics proteomics api specification • 4.0k views
ADD COMMENT
2
Entering edit mode
10.0 years ago
Laurent ★ 1.7k

These formats are the official formats for proteomics data: mzML for raw data, mzIdentML for identification data and mzQuantML for quantitation data. The former is widerly accepted, although it's predecessor mzXML is still being used. mzIdentML is also widely used. The latter, which is more recent, not that much, as far as I know. Note that there is also mzTab, for processed data in tabular format. There are more formats on the PSI page, for instance for interaction data.

There are APIs in multiple languages; there is a lot of java out there, as that is what is heavily used by many of the PSI members. See proteowizard for a C++ interface. mzR is an R interface and is based on pwiz. I bet there are python interfaces out there. I remember seeing a list of available APIs on the PSI page, although I can't find them for now. I doubt, they have public mailing lists.

Hope this helps.

Edit: oh, and I forgot the latest qcML for quality control... (but not very widely used, as far as I know).

ADD COMMENT
0
Entering edit mode

Thanks very much. Also found a list tools that import or export mzIdent of PSI website: http://www.psidev.info/tools-implementing-mzidentml Don't know if these tools support also the other formats though.

ADD REPLY
0
Entering edit mode

In terms of identification, mzIdenML (version 1.1) is widely used. I think pepXML is still in use in the TPP suite. OpenMS also has/had it's own XML based formats (at least internally), but nowadays supports the official ones.

ADD REPLY
2
Entering edit mode
9.9 years ago
William ★ 5.3k

This presentation gives also a good overview of the proteomics formats and API's. The presentation seems to be fairly recent.

http://www.psidev.info/sites/default/files/presentations/PROCESS.pptx

ADD COMMENT

Login before adding your answer.

Traffic: 2256 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6