We are looking for PostDocs interested in Generative models for mass spectrometry data. The project, also described at https://goo.gl/Zdh1Qx among several other projects:
Project title: Generative models for mass spectrometry
Timo Koski, Department of Mathematics
Lukas Käll, Department of Gene Technology, KTH & SciLife Lab
Description: Most functions of a cell are carried out by proteins, hence the identication and determination of the concentration of proteins is seen as the most direct way to determine the current state of a cell or organism. Here mass spectrometry (MS) plays a central role in characterizing target proteins and protein products. MS has become established as the primary method for protein identification from complex mixtures of biological origin. The applications of MS include the determination of protein molecular mass, peptide mapping, peptide sequencing, ligand binding, determination of disulfide bonds, active site characterization of enzymes, protein self-association and protein folding/higher order structural characterization and post-translational modifications.
A frequently identified bottleneck for the implementation of MS is the analysis of data. Modern mass spectrometers produce massive amounts of data. State-of-the-art mass spectrometers like Thermo Fusion, produces > 24 GB of compressed data per day. For the individual laboratories, this is often problematic, as the wealth of data strains the labs infrastructure in form of data storage and computational processing time. The future success of MS-based proteomics can be claimed to depend on efficient methods for large scale data analysis and data-driven modeling. In data-driven modeling techniques from analysis of data are used to infer a mathematical model directly from data without explicitly describing the system. The central task treated here is to develop mathematical-statistical methods and algorithms for inference of proteins from tandem mass spectrometry data. Statistical learning theory provides several methods, which take into account all the data but do not offer any conceptual simplification of the phenomena that are studied. The central problem in protein mass spectrometry is to identify proteins from fragment mass spectrometry data. Due to the experimental set-up, this is a challenging inference problem.
The problem of identifying proteins by MS-based proteomics has not been definitively solved. The question of rigorous assessment of statistical significance of protein identifications is a mathematically challenging open problem. For example, the problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein's presence, has turned out to be very hard. This is a proposal for new results in probability/statistical learning theory that solve the problem of protein identification in tandem mass spectrometry. The new techniques will be developed by research on deep generative models, such as Generative Adversarial Networks (GAN) and Sum-Product Networks (SPNs), which involve subdomains as sparse signal modeling and probabilistic graphical models.
The Stockholm-Uppsala region is the leading and expanding cluster of biotechnology and biomedicine in Sweden. As KTH Royal Institute of Technology hosts Science for Life Laboratory ( SciLife Lab), the national resource in biotechnology, we find the research in this proposal to have excellent chances of contributing to biotechnology and biomedicine by mathematical and statistical expertise, as well as to opening and developing new fields of mathematical and statistical research. Mathematics and computational science play a key role in effectively translating the vast amount of data from experimental and high throughput analysis of biological samples into a deeper knowledge about the biological systems under study.
The full ad can be found here: https://goo.gl/eUEV4J