Machine Learning - RNAseq
2
1
Entering edit mode
8.7 years ago
David_emir ▴ 500

Hi All,

Greetings !!!

I have analysed TCGA breast cancer data (RNA-seq) for about 1200 samples. I have divided it into Normal vs. cancer samples. I have analysed the fastq files and sucessfully implemented rna-seq pipeline to get Differentially expressed gene list (DGE). Now my question is, i dont want to stop at this stage, i have huge data on my table, and wanted to use it to maxium extent, I wanted to apply Machine Learning (ML) to it. But i am not able to get a protocol/ any good papers on the same. i searched google but majority of resources linked to computer science, i am failing to corelate that to biology.

Please help me to get protocols for applying ML to Rna-seq data (Other than MLSeq - Bioconductor package) or any other good resources. i wanted to get few genes which qualifies as biomarkers for breast cancer. I may be sounding stupid, but i am curious to apply ML for my TBs of data.

Thanks a lots for your kind help,

Regards, Emir, David

RNA-Seq Machine learning training data • 5.9k views
ADD COMMENT
0
Entering edit mode

Hi David_emir, How do you download these fastq files of RNA-seq? And could you tell me?

ADD REPLY
0
Entering edit mode

Since your post is not an answer to this thread I have moved this to a comment. Perhaps this question would fit better in a separate thread. It's perfectly possible that David_emir will not see this thread soon (the last time he was online is 4 weeks ago).

ADD REPLY
2
Entering edit mode
8.7 years ago
Amitm ★ 2.3k

One resource is already on this site - Machine Learning For Cancer Classification - Part 1 - Preparing The Data Sets

The dataset used is microarray but once you have analyzed RNA-seq to get the counts, downstream steps should be more or less same.

Apart from this, a recent paper came out on RNA-seq based classifier. Here I am sure there would be many many more

ADD COMMENT
0
Entering edit mode

Sir, how do we get the counts from RNA-seq data? I download samples from TCGA, could you please tell me how to convert RNA-seq data to expression and use it for machine learning? Thanks in advance

ADD REPLY
2
Entering edit mode
7.7 years ago
paul.e.gradie ▴ 110

So many months ago! Sorry this wasn't seen sooner - I'm sure you've resolved this problem already.

Google tools such as HTseq and SubRead. These are packages that contain tools for quantifying read counts, which are then commonly used for downstream analysis. HTseq-Count and featureCounts in particular, respectively.

The original question was quite vague and there are many machine learning algorithms that could be considered generally - though there may only be a few you might actually use. Your experimental design should include the need to predict something, which means you'd need to split off maybe a third or a quarter of your data to use for validation, which you could generate by splitting of portions from that monstrous 1200 sample dataset. But then you'd need to decide what you'd like to try and predict.

The link above uses sample data to train a randomForest classifier which should be able to predict the relapse of cancer given a particular genetic profile. Are you trying to create a similar predictive model? ScikitLearn is a python package for easy to use out of the box machine learn that allows you to implement RandomForest, SVM, Clustering, and a variety of other learning algorithms may be useful in this area. Are you trying to identify patterns across data sets? Or perhaps you'd like to train a neural network to recognize these patterns in your data - something that may be possible considering the magnitude of your data set - providing its an accurate one. If there is anyone around with expertise building neural classification networks, speak to them about using TensorFlow to build and test a custom network that is able to recognize cancer profiles.

Sounds like you actually have an interesting opportunity to perform a cool study with so much data. Let us know what went down.

ADD COMMENT

Login before adding your answer.

Traffic: 2356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6