Using DESeq2 results for building a classifier
0
0
Entering edit mode
6.7 years ago
bioinfo456 ▴ 150

Hi all,

My purpose of using DESeq2 is to obtain a set of differentially expressed genes for 3 different subtypes of a certain cancer and to build a classifier using the resulting genes from DESeq2 to classify between normal and the 3 cancer subtype samples.

Is there any alternate way to confirm that the resultant of DESeq2 is appropriate for this purpose?

deseq2 RNA-Seq differentially expressed genes • 3.5k views
ADD COMMENT
3
Entering edit mode

Differential expression and sample-classification are two quite different problems. If you want to detect differentially expressed genes then DESeq2 is OK (dude you've been asking the same question for a fortnight, just do the experiment)

ADD REPLY
4
Entering edit mode

To go from differential expression all the way to a classifier, please take a look at my answer here: What is the best way to combine machine learning algorithms for feature selection such as Variable importance in Random Forest with differential expression analysis?

Although, I fear that it may be overly complex for you, i.e., if you are already struggling with the differential expression part.

----------------------

Uday, in addition, please follow up with the other comments by Wouter and Sean. If you engage more, you will learn more and people will likely help you more.

Kevin

ADD REPLY
0
Entering edit mode

I don’t feel the need to perform yet another statistical approach like stepwise regression since deseq2 already involves the concept of p value in it. Correct me if I’m wrong. Testing one gene at a time manually is out of question. You reckon there is any other ways of doing it?

ADD REPLY
0
Entering edit mode

With all due respect sir, I’m actually done doing what I felt is right. At the moment, I’m clarifying things. I’ll be glad if you could help. Otherwise, kindly ignore my posts. Thanks.

ADD REPLY
0
Entering edit mode

Okay, that is great to hear, Uday! Please stay in close touch

ADD REPLY
0
Entering edit mode

Renal cell cancer is majorly divided into 3 classes (ie; KICH KIRP KIRC). I used deseq2 to identify DEG for each class. Suppose x was the resulting subset of genes for KICH similarly y and z for KIRP and KIRC, I eliminated all intersections for each class (ie; s = (x u y u z) - ((x n y n z) u (x n y) u (y n z) u (z n x))). As a result of which I got approx 5k genes. Now, I have a 1000 samples out of which I’ve divided 700 for training and 300 for testing. I’ve extracted s genes out of the training set and trained a classifier model. I’ve obtained a certain significant results.

I’m trying to put together a paper. There exists a paper which makes use of deseq2 and also the same data set. Now my question is how do I go about writing this paper? Their methodology is completely different. They have used DEGs as low as 250 genes and obtained a certain result which is comparatively less than what I got. Please share your thoughts. Thanks.

ADD REPLY
0
Entering edit mode

Thanks for sharing your process. I think publication level queries are best kept off a Q&A site, seems a bit presumptuous to me.

ADD REPLY
1
Entering edit mode

Could you perhaps select a more descriptive title for your threads?

ADD REPLY
1
Entering edit mode

What is the classifier you want to build? And what samples will you be classifying, relative to the DESeq2 analysis--different samples or the same samples?

ADD REPLY
0
Entering edit mode

I have samples from 3 different cancer subtypes and their corresponding normals. Each sample consists of 20530 genes. Inputting such huge samples to a classifier is pointless coz it doesn’t achieve a reasonable accuracy for classifying. Which is why I’m extracting only those genes which are affected between normal and cancer samples using DESeq2 and then building the classifier using the same. For the last part of your question, I’m not really sure if I should be using a different set of samples for testing or the same. I’ll be glad if you could help.

ADD REPLY
2
Entering edit mode

Well if you are going to test your classifier using genes which you selected to be differentially expressed between groups A and B you classifier is probably going to be good at differentiating between groups A and B because you biased it severely. That's cheating :) You need an independent set.

And again: could you perhaps select a more descriptive title for your threads?

ADD REPLY
0
Entering edit mode

Haha alright sir. Will test using an independent set and get back to you. Suggest me a descriptive title yourself xD.

ADD REPLY
0
Entering edit mode

Haha alright sir.

There is no need to be gender specific. Not everyone in science is male.

Suggest me a descriptive title yourself xD

We have thousands of questions concerning differential expression analysis. What about something like "Using DESeq2 results for building a classifier", which is A LOT more specific about what you want answers to.

ADD REPLY
1
Entering edit mode

I’m so sorry if I have offended you. Title changed :).

ADD REPLY
2
Entering edit mode

For the record, I'm male. I'm not offended, but please avoid such biased assumptions in the future.

ADD REPLY
2
Entering edit mode

Yes, Wouter is male!

ADD REPLY

Login before adding your answer.

Traffic: 1484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6