Based on the post Machine learning reveals unexplored gene/protein combinations in cell signalling pathway (OUP publication)
Chronology of work (April 2013 till now)
Research project # 1
Duration of research - April 1, 2013 to March 10, 2025 / 11 years 11 months 9 days
Foundation - The design principles of the machine learning based search engine were first presented as poster in at Wnt EMBO 2016 (documented in https://www.biorxiv.org/content/early/2017/08/08/059469 , for static data) and Berkeley Cell Symposia : Technology, Biology and Data Science 2016 (documented in https://www.biorxiv.org/content/early/2017/07/20/060228 , for time series data).
Culminated in -
BMC Systems Biology https://doi.org/10.1186/s12918-017-0488-z (4 December 2017)
Integrative Biology, Oxford University Press, https://doi.org/10.1093/intbio/zyae020 (28 November 2024)
Data set source -
(Type - static) https://doi.org/10.1016/j.ccr.2008.04.019
(Type - time series) https://doi.org/10.1371/journal.pone.0010024
Code availability - https://zenodo.org/records/14637456 at CERN based Zenodo (freely available under Creative Commons Attribution 4.0 International)
Project - Time Behavioural Study of 3rd Order Combinations in WNT3A Stimulated HEK 293 Cells
Total number of research papers - 16
Link to research papers - https://osf.io/hcbq7/files/osfstorage
License - CC-By Attribution 4.0 International
What to find - Clicking the "Files" tab will open a range of genes/proteins, for each of which, the time behavioural study has been made for 3rd order combinations of respective gene/protein of interest. Under each gene/protein directory, there are two files -
a preprint/unpublished article containing the time behavioural study using the OUP published machine learning based search engine
a respective Supplementary-[GeneName].zip file, containing approximately 2400 3rd order gene combinations with changing rankings over time. These machine learning ranked/prioritised combinations, some of which are validated and most of which are unexplored/untested, need wet lab testing.
Research project # 2
Duration of research - August 1, 2015 to March 10, 2025 / 9 years 7 months 9 days
Foundation - The ETC-1922159 was released in Singapore in July 2015 under the flagship of the Agency for Science, Technology and Research (A*STAR) and Duke-National University of Singapore Graduate Medical School (Duke-NUS). In a related publication, recording of regulation (up/down) of some 5000 genes were made (available online with the published paper), after the ETC-1922159 drug was tested on colorectal cancer cells. I tested the modification of the above machine learning based search engine (now published in OUP) on this static ETC-1922159 data set, and discovered various 2nd/3rd order combinations of genes that might affect various pathways, after the drug was administered. Aspects of work and unpublished results were presented as poster at Wnt Signaling Gordon Research Conference, Wnt Signaling : A pathway implicated in Animal Development, Stem Cell Control and Cancer; 2017.
Culminated in - The adaptation of the search engine and the related unpublished research work, is available as preprint on Biorxiv at https://www.biorxiv.org/content/10.1101/180927v2 (Version 1 on August 26, 2017); Latest with chronological literature review in https://osf.io/dn5h6 (March 6, 2025) and related pedagogical code paper on Qeios at https://www.qeios.com/read/DPKY8G (Jan 30, 2025).
Data set source - https://www.nature.com/articles/onc2015280 (Type - static)
Code availability - https://zenodo.org/records/14636112 at CERN based Zenodo (freely available under Creative Commons Attribution 4.0 International)
Project - Machine learning discoveries of 2nd order synergy in ETC-1922159 treated colorectal cancer cells
Total number of research papers - 30
Link to research papers - https://osf.io/ngef9/files/osfstorage
License - CC-By Attribution 4.0 International
What to find - Clicking the "Files" tab will open a range of genes/proteins, for each of which, there is a preprint/unpublished article containing the 2nd order discoveries related to that particular gene/protein of interest. These machine learning ranked/prioritised combinations, some of which are validated and most of which are unexplored/untested, need wet lab testing.
What has been done - The machine learning part is complete, to a certain extent. Interested researchers/students, can test as well as build on this material, under the terms and conditions of CC-By Attribution 4.0 International license.
In my limited grasp, this work will generate multiple research manuscripts, each addressing a particular gene of interest and possible combinations that remain unexplored. The machine learning based search engine is data independent, except for the requirement of tuning the code for data processing/extraction part. To paraphrase, the engine reveals (un)-known combinations of order 2 or more, in the form of discoveries. The discoveries that have been documented in Open Science Framework platform are like a bunch of leaves collected in palms and this is what has been presented, in these past 10/12 years time. However, the forest out there contains numerous leaves which might not be collected in the palms at all, in one life time; forget about analysing and studying each of them deeply.
Hope this small contribution is of some help in research work in cell/cancer biology.
best regards & take care
shriprakash