Question

How to test DEGs from microarray data on RNA-seq dataset

0

Entering edit mode

19 months ago

Llander • 0

Hi there,

I want to create a predictive score of whether a tissue is diseased or healthy with the DEGs that I've yielded from a limma analysis (with subsequent refining through stepwise regression methods and currently working on including clinical variables).

I did the original analysis with a GEO dataset of microarray data using Affymetrix U133A and I want to test my gene signature on a TCGA dataset- by looking through literature this seems like a controversial topic so I was wondering if anyone has any ideas on how may be performed?

I know that the values and ranges between the two types of data are very different but these are the only two datasets I know of that have the clinical variables that I am looking for. Any help or thoughts are greatly appreciated.

microarray differential limma RNA-seq expression prediction • 1.4k views

ADD COMMENT • link 18 months ago by Llander • 0

0

Entering edit mode

You have a gene signature (i.e. list of [upregulated?] genes). I've had good experience with running GSEA (or some related/better tool) on each sample. Give it a try: TCGA has matched normal/tumor tissue rnaseq, and you have a ground truth, so see what you get. If you have DEG genes generally, you could also train a classifier on scaled TCGA data using your DEGs as features (divide TCGA samples into train/validation/test sets).

Just my suggestions.

ADD REPLY • link 19 months ago by dsull ★ 6.9k

0

Entering edit mode

Hi, I have similar questions that I have some DEGs in my animal model, I plan to validate them in GEO database. Is it correct? And the GEO dataset which is from patients blood, my microarray is from mice blood. Could I check the DEGs in animals homologous with human whether they were still different in GEO dataset? Then I could train a classifier on GEO dataset as gene signiture.

ADD REPLY • link 19 months ago by Di Wu • 0

0

Entering edit mode

Yes, you can definitely do that with the orthologous genes. I've done that in the past -- see if a gene signature in my mouse tumor model translates to human primary cancers.

ADD REPLY • link 19 months ago by dsull ★ 6.9k

0

Entering edit mode

Thanks dsull, I also wonder that if I have found the orthologous gene signiture in GEO dataset for validation, what else I could do? I wound think that my microarray was from the mice blood which may not be further investigated in the mechanism, but in this multi-omics era, it may hard for us to publish transciptomic paper on high level journal. Could you please tell me some similar works to learn?

ADD REPLY • link 19 months ago by Di Wu • 0

0

Entering edit mode

It's difficult to answer because I don't know what scientific question you want to answer or what you hope to discover. You should have a question you're trying to answer or a hypothesis before diving into the data. Running a classifier on data without a good research question in mind will not get you published.

Re: transcriptomics, my recent paper used microarrays and still was published in a good journal: https://www.nature.com/articles/s41388-022-02458-9

If you can discover good biology or something of good medical relevance, that's all that matters.

ADD REPLY • link 19 months ago by dsull ★ 6.9k

0

Entering edit mode

This journal you published is definitly top level journal. Thank you for your suggestions. It does give me some idea, because my research focus on several model which need to validate the co-DEGs.

ADD REPLY • link 19 months ago by Di Wu • 0

0

Entering edit mode

I have a list that contains both upregulated and downregulated genes. I have tried GSEA and those results are quite different from the DEGs I found through limma and did not make a lot of biological sense. I think training a classifier on TCGA data with my DEGs as features and dividing the data into train/test is a great idea and I will try that out! May I ask what the difference between a validation and test set is in this situation? I assumed that I would be able to train on say 70% of the data and test on the remaining 30%

Thanks kindly for your response!

ADD REPLY • link 18 months ago by Llander • 0