Machine learning with whole genome sequence data in cancer research?
0
0
Entering edit mode
6.3 years ago
dan ▴ 20

What are some use cases for applying machine learning techniques in cancer research with whole genome sequence data? I'm not interested in variant calling or analysis of images (if a tumour is malign or not). Just in analysis of whole genome sequence data (tumour & normal) for cancer research.

cancer genome • 1.5k views
ADD COMMENT
2
Entering edit mode

Well you might not be explicitly interested in variant calling and such, but that's what whatever method you use will end up doing under the hood. The whole point of doing genomic sequencing in cancer is to find what's different and whatever you end up predicting/classifying/etc. will be dependent upon that.

ADD REPLY
0
Entering edit mode

yes, I agree that it will depend on that. but I'm looking for something further down the line, once the calling has been done.

ADD REPLY
1
Entering edit mode

I would think that classifying subtypes would be useful, for example into "currently druggable" and not.

ADD REPLY
0
Entering edit mode

Can you provide a link or 2 of some examples please?

ADD REPLY
1
Entering edit mode

I don't know if such examples even exist, that's a project idea (I'm doubtful that it'd go anywhere, but then I think much of the machine learning stuff in biology is going no where).

ADD REPLY
2
Entering edit mode

but then I think much of the machine learning stuff in biology is going no where

...and I independently agree with Devon here. I write more, here: A: What is the best way to combine machine learning algorithms for feature selectio

It already feels as if the 'wave' and hype of machine learning has already passed, with some remnants remaining. Maybe we can now get back to actually being serious about solving issues that we face in health sciences instead of jumping from one trend to another and always avoiding the issues.

ADD REPLY
1
Entering edit mode

You may consider some of the classification algorithms that have been done in the realm of non-coding pathogenicity predictors. I developed a very long presentation on these algorithms, but cannot share it. Nevertheless, the work was interesting enough to be noteworthy for future reference:

  • CADD (germline variants)
  • DANN (germline variants)
  • FATHMM-MKL (germline variants)
  • GWAVA (germline variants | somatic mutations)
  • Funseq2 (somatic mutations)
  • SurfR (rare variants | complex disease variants | all other variants)

These tools mostly used 'machine learning' algorithms. Some, ironically, prove that standard logistic regression is comparable to or better than the very tool that they are reporting, yet these were still published.

ADD REPLY

Login before adding your answer.

Traffic: 2708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6