Question

Integrating NGS Data for Machine Learning

0

Entering edit mode

6.5 years ago

email.egail • 0

I want to be able to use a machine learning algorithm to be able to predict if a particular gene is expressed based on its binding with multiple histones/proteins (likely based on ChIP-seq data).

There would be matrix that would be sorted by regions (like a BED file) containing data such as if the region has a called peak (from ChIP-seq data), if the gene is expressed (RNA-seq data) and any other NGS data that could be integrated.

However, I am having some issues:

I’m having some trouble integrating the RNA-seq and ChIP-seq data. I’m trying to use the intersect command from bedtools but I am not getting any results.

bedtools intersect -a ref.bed -b fileA.bed fileB.bed > output.bed

Is there another/better way to see the overlap?

Ideally, I would like to be able to use multiple cell types to be able to generalize this data. However, this would require creating a third dimension to my data and all of the tools I am familiar with only take two-dimensional data. How best would I incorporate this extra dimension in my dataset?

rna-seq ChIP-Seq machine learning • 2.2k views

ADD COMMENT • link updated 6.5 years ago by timpaines • 0 • written 6.5 years ago by email.egail • 0

2

Entering edit mode

Data with more than two dimensions are generally called tensors in the machine learning and data mining communities. There are multiple ways you could go forward depending on your data. You could try tensor regression, support tensor regression or use kernels on tensors to fall back on standard kernel methods or use tensor factorization to project your data into a latent feature space where you could use standard 2d methods. If you're into the current deep learning fashion, you could also use a neural network to extract features that you can use with a more standard machine learning method.

ADD REPLY • link 6.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Out of interest, Jean-Karim, if you are working in this area, which programs / resources are you using?

ADD REPLY • link 6.5 years ago by Kevin Blighe 89k

1

Entering edit mode

I assume the area is tensors not deep learning. For this, I am using R with package rTensor as base for my own functions (e.g. tensor ridge regression). There's also the nnTensor package for non-negative factorizations.

ADD REPLY • link 6.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Just about the bedtools, try -b fileA.bed,fileB.bed

ADD REPLY • link 6.5 years ago by GouthamAtla 12k