Question

Integrative analysis of omics studies using machine learning

0

Entering edit mode

6.8 years ago

FX ▴ 20

Hi All,

I would like to use public omics datasets (ChIP-seq, RNA-seq, and ATAC-seq) from different studies to do an integrative analysis as follow:

Normalise samples, within each type of omics, from different public datasets.
Convert the normalised values into a uniform scale to make the comparison between ChIP-seq, RNA-seq and ATAC-seq possible.
Feed the normalised uniformed values into machine learning to infer one feature (e.g. RNA expression) from other features (e.g. TF or histone marks ChIP-seq).

Does anyone have experience with this type of analysis? I would like to hear about preferable approaches, problems, caveats, etc.. that I need to worry about / take care of before I start working on it.

Many thanks.

RNA-Seq ChIP-Seq ATAC-seq Machine Learning • 1.8k views

ADD COMMENT • link updated 5.9 years ago by Kevin Blighe 89k • written 6.8 years ago by FX ▴ 20

0

Entering edit mode

Hi Firas, We are interested to do exactly the same thing for diferent plant mutants. have you got positive experience with that?

we can interchange experiences by email, if you´r interested!!

ADD REPLY • link 5.9 years ago by rnadi • 0

0

Entering edit mode

Please use ADD COMMENT, not the answer field. Fyi, OP has not logged-in for 9 months, unlikely to get a response.

ADD REPLY • link 5.9 years ago by ATpoint 89k

score 3 · Answer 1 · 2019-10-03

The big question back to you is this: which biological question (or questions) are you hoping to answer by performing a 'multi-omics' analysis and 'machine learning'? From what I can see, and please correct me if I am incorrect on this, the majority of people are doing these studies without knowing the end goal, and/or without even understanding why they are doing them in the first place.

I write on related topics, here:

You can try your best to process these datasets to the point where they are on the same data distribution, however, by then, you may have processed them so much such that much information has been lost in the process. Elsewhere, for example, I saw a Scientific Reports manuscript whereby the authors managed to encode each 'omics' dataset as a kernel and summarises the data into matrices of binary values, which seems meaningless to me.

Kevin