feature selection using random forest
1
0
Entering edit mode
4.9 years ago
newbie ▴ 130

Hi,

Need small help. I have some hundred samples and I have already classified them into four different classes (clusters). Now, I'm interested in identifying the best set of genes that classify the samples into different classes. Both up and down genes in each class.

For this I have already used t-test. But I'm interested in applying random forest for selecting features. My data looks like below. Just posting some example data here.

enter image description here

Can anyone please tell me how I can use the above data and apply random forest to know which genes classify the samples into different classes. thanq.

RNA-Seq R randomforest featureselection • 3.3k views
ADD COMMENT
0
Entering edit mode

which type of data : RNA-Seq, microarray ?

ADD REPLY
0
Entering edit mode

It is RNA-seq data with 100 samples

ADD REPLY
0
Entering edit mode
4.9 years ago
dsull ★ 7.1k

You have four classes. Why are you using a t-test? You should be using ANOVA.

Second, as random forest can tell you feature importances, you can use randomforest with recursive feature elimination (Look up: Recursive feature elimination with cross validation) to figure out a set of features with the best predictive value.

ADD COMMENT
0
Entering edit mode

Thank you. If you Donn't mind could you please give me an example how to do this. I'm very new to this type of analysis. With above data please give me an examples. thanks again.

ADD REPLY
0
Entering edit mode

Here's an example:

https://topepo.github.io/caret/recursive-feature-elimination.html

If you're new, unfortunately, it's going to take some effort for you to read tutorials and write code. Using advanced supervised machine learning methods properly is not trivial (e.g. you'll need to understand hyperparameter tuning, metrics to measure model performance, cross-validation, multilabel classification, etc.). Also 100 samples is quite small so I wonder why you want to use random forests in the first place as opposed to selecting features using simpler generalized linear models (e.g. DESeq2).

ADD REPLY

Login before adding your answer.

Traffic: 1104 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6