Based on the post Machine learning reveals unexplored gene/protein combinations in cell signalling pathway (OUP publication)
What has been done - The machine learning part is complete, to a certain extent. Interested researchers/students, can test as well as build on this material, under the terms and conditions of CC-By Attribution 4.0 International license.
Chronology of work (April 2013 till now)
Research project # 1
Duration of research - April 1, 2013 to March 10, 2025 / 11 years 11 months 9 days
Foundation - The design principles of the machine learning based search engine were first presented as poster in at Wnt EMBO 2016 (documented in https://www.biorxiv.org/content/early/2017/08/08/059469 , for static data) and Berkeley Cell Symposia : Technology, Biology and Data Science 2016 (documented in https://www.biorxiv.org/content/early/2017/07/20/060228 , for time series data).
Culminated in -
BMC Systems Biology https://doi.org/10.1186/s12918-017-0488-z (4 December 2017)
Integrative Biology, Oxford University Press, https://doi.org/10.1093/intbio/zyae020 (28 November 2024)
Data set source -
(Type - static) https://doi.org/10.1016/j.ccr.2008.04.019
(Type - time series) https://doi.org/10.1371/journal.pone.0010024
Code availability - https://zenodo.org/records/14637456 at CERN based Zenodo (freely available under Creative Commons Attribution 4.0 International)
Project - Time Behavioural Study of 3rd Order Combinations in WNT3A Stimulated HEK 293 Cells
Total number of research papers - 23
Statistics -
- Total no. of downloads - 313
- Last updated - March 27, 2025
- Servers - preprints.org and osf.io
List of genes currently covered -
A - Individual / unique genes
- beta-transducin repeat containing E3 ubiquitin protein ligase or F-box/WD repeat-containing protein 1A (BTRC/FBXW1A)
- CXXC-type zinc finger protein 4 (CXXC4)
- E1A binding protein p300 / histone acetyltransferase p300 (EP300)
- follicle stimulating hormone subunit beta (FSHB)
- forkhead box N1 (FOXN1)
- FOS like 1, AP-1 transcription factor subunit (FOSL1)
- frequently rearranged in advanced T-cell lymphomas FRAT regulator of WNT signaling pathway 1 (FRAT1)
- KIAA1735/Dixin DIX domain containing 1 (DIXDC1)
- kringle containing transmembrane protein 1 (KREMEN1)
- nemo like kinase (NLK)
- NKD inhibitor of WNT signaling pathway 1 / naked cuticlehomolog 1 (NKD1)
- paired-like homeodomain transcription factor 2 (PITX2)
- porcupine O-acyltransferase (PORCN)
- protein phosphatase 2A catalytic subunit, alpha isoform(PPP2CA)
- protein phosphatase 2A structural/scaffold subunit A, alpha isoform (PPP2R1A)
- pygopus family PHD finger 1 (PYGO1)
- ras homolog family member U / Wnt-1 responsive Cdc42homolog (RHOU/WRCH1)
- SUMO1/sentrin/SMT3 specific peptidase 2 (SENP2/AXAM2)
- v-jun avian sarcoma virus 17 oncogene homolog / Junproto-oncogene, AP-1 transcription factor subunit (JUN)
- WNT inhibitory factor 1 (WIF1)
B - Gene family
- Wnt family members (WNT - 1/2B/3A/4/5A)
- Casein kinase family members (CSNK - 1(A1/D/G1) & 2A1)
Link to research papers - https://osf.io/hcbq7/files/osfstorage
License - CC-By Attribution 4.0 International
What to find - Clicking the "Files" tab will open a range of genes/proteins, for each of which, the time behavioural study has been made for 3rd order combinations of respective gene/protein of interest. Under each gene/protein directory, there are two files -
a preprint/unpublished article containing the time behavioural study using the OUP published machine learning based search engine
a respective Supplementary-[GeneName].zip file, containing approximately 2400 3rd order gene combinations with changing rankings over time. These machine learning ranked/prioritised combinations, some of which are validated and most of which are unexplored/untested, need wet lab testing.
Research project # 2
Duration of research - August 1, 2015 to March 10, 2025 / 9 years 7 months 9 days
Foundation - The ETC-1922159 was released in Singapore in July 2015 under the flagship of the Agency for Science, Technology and Research (A*STAR) and Duke-National University of Singapore Graduate Medical School (Duke-NUS). In a related publication, recording of regulation (up/down) of some 5000 genes were made (available online with the published paper), after the ETC-1922159 drug was tested on colorectal cancer cells. I tested the modification of the above machine learning based search engine (now published in OUP) on this static ETC-1922159 data set, and discovered various 2nd/3rd order combinations of genes that might affect various pathways, after the drug was administered. Aspects of work and unpublished results were presented as poster at Wnt Signaling Gordon Research Conference, Wnt Signaling : A pathway implicated in Animal Development, Stem Cell Control and Cancer; 2017.
Culminated in -
The adaptation of the search engine and the related unpublished research work, is available as preprint on Biorxiv at https://www.biorxiv.org/content/10.1101/180927v2 (Version 1 on August 26, 2017); Latest with chronological literature review in https://osf.io/dn5h6 (March 6, 2025)
related pedagogical code paper on Qeios at https://www.qeios.com/read/DPKY8G (Jan 30, 2025).
Data set source - (Type - static) https://www.nature.com/articles/onc2015280
Code availability - https://zenodo.org/records/14636112 at CERN based Zenodo (freely available under Creative Commons Attribution 4.0 International)
Project - Machine learning discoveries of 2nd order synergy in ETC-1922159 treated colorectal cancer cells
Total number of research papers - 32
Statistics -
- Total no. of downloads - 1050
- Last updated - March 27, 2025
- Servers - preprints.org, ssrn.com and osf.io
List of genes currently covered -
A Individual / unique genes
- Achaete-scute complex homolog 2 (ASCL2)
- anthrax toxin receptor cell adhesion molecule 2 (ANTXR2)
- ATPase H+ transporting V(0/1) subunit e2 (ATP6V-(0/1)-E2)
- Aurora kinase B (AURKB)
- Autophagy related 3 (ATG3)
- Bloom syndrome/BLM RecQ like helicase (BLM)
- budding uninhibited by benzimidazoles 1 mitotic checkpoint serine/threonine kinase (BUB1)
- DNA topoisomerase II alpha (TOP2A)
- Fanconi anemia complementation group D2 (FANCD2)
- Forkhead box protein M1 (FOXM1)
- go-ichi-ni-san complex subunit 1 (GINS1 / PSF1)
- hydrogen voltage gated channel 1 (HVCN1)
- Methyltransferase 3, N6-adenosine-methyltransferase complex catalytic subuni (METTL3)
- v-myc avian myelocytomatosis viral oncogene homolog (MYC)
- Homeobox protein Hox-B8 (HOXB8)
- polo like kinase 4, serine/threonine-protein kinase (PLK4)
- RHINO RAD9-HUS1-RAD1 interacting nuclear orphan 1 (RHNO1)
- Six-transmembrane epithelial antigen of prostate 3 (STEAP3)
- timeless circadian regulator (TIMELESS)
- wntless Wnt ligand secretion mediator (WLS)
B Gene family
- ATP-binding cassette transporters (ABC)
- B cell CLL/lymphoma (BCL)
- DNA gene repair family
- Interleukin (IL)
- Nuclear factor kappa-light-chain-enhancer of activated B cells ((NFkB)
- poliovirus receptor-related (PVR)
- tumor necrosis factor (TNF)
- Wnt family member (WNT)
Link to research papers - https://osf.io/ngef9/files/osfstorage
License - CC-By Attribution 4.0 International
What to find - Clicking the "Files" tab will open a range of genes/proteins, for each of which, there is a preprint/unpublished article containing the 2nd order discoveries related to that particular gene/protein of interest. These machine learning ranked/prioritised combinations, some of which are validated and most of which are unexplored/untested, need wet lab testing.
In my limited grasp, this work will generate multiple research manuscripts, each addressing a particular gene of interest and possible combinations that remain unexplored. The machine learning based search engine is data independent, except for the requirement of tuning the code for data processing/extraction part. To paraphrase, the engine reveals (un)-known combinations of order 2 or more, in the form of discoveries. The discoveries that have been documented in Open Science Framework platform are like a bunch of leaves collected in palms and this is what has been presented, in these past 10/12 years time. However, the forest out there contains numerous leaves which might not be collected in the palms at all, in one life time; forget about analysing and studying each of them deeply.
Hope this small contribution is of some help in research work in cell/cancer biology.
Acknowledgement - Special thanks to Mrs. Rita Sinha and late Mr. Prabhat Sinha for supporting financially, without which this work could not have been made possible. Thanks to the deep teachings of the Anapana and Vipassana meditations by the Buddha, which i learnt from Vipassana Research Institute, founded by Vipassana Acharya Satya Narayana Goenka and as expounded in the Sattipathan Sutta, which carried me through the entire duration of the research.
best regards & take care
shriprakash