Hi there,
I want to create a predictive score of whether a tissue is diseased or healthy with the DEGs that I've yielded from a limma analysis (with subsequent refining through stepwise regression methods and currently working on including clinical variables).
I did the original analysis with a GEO dataset of microarray data using Affymetrix U133A and I want to test my gene signature on a TCGA dataset- by looking through literature this seems like a controversial topic so I was wondering if anyone has any ideas on how may be performed?
I know that the values and ranges between the two types of data are very different but these are the only two datasets I know of that have the clinical variables that I am looking for. Any help or thoughts are greatly appreciated.
You have a gene signature (i.e. list of [upregulated?] genes). I've had good experience with running GSEA (or some related/better tool) on each sample. Give it a try: TCGA has matched normal/tumor tissue rnaseq, and you have a ground truth, so see what you get. If you have DEG genes generally, you could also train a classifier on scaled TCGA data using your DEGs as features (divide TCGA samples into train/validation/test sets).
Just my suggestions.
Hi, I have similar questions that I have some DEGs in my animal model, I plan to validate them in GEO database. Is it correct? And the GEO dataset which is from patients blood, my microarray is from mice blood. Could I check the DEGs in animals homologous with human whether they were still different in GEO dataset? Then I could train a classifier on GEO dataset as gene signiture.
Yes, you can definitely do that with the orthologous genes. I've done that in the past -- see if a gene signature in my mouse tumor model translates to human primary cancers.
Thanks dsull, I also wonder that if I have found the orthologous gene signiture in GEO dataset for validation, what else I could do? I wound think that my microarray was from the mice blood which may not be further investigated in the mechanism, but in this multi-omics era, it may hard for us to publish transciptomic paper on high level journal. Could you please tell me some similar works to learn?
It's difficult to answer because I don't know what scientific question you want to answer or what you hope to discover. You should have a question you're trying to answer or a hypothesis before diving into the data. Running a classifier on data without a good research question in mind will not get you published.
Re: transcriptomics, my recent paper used microarrays and still was published in a good journal: https://www.nature.com/articles/s41388-022-02458-9
If you can discover good biology or something of good medical relevance, that's all that matters.
This journal you published is definitly top level journal. Thank you for your suggestions. It does give me some idea, because my research focus on several model which need to validate the co-DEGs.
I have a list that contains both upregulated and downregulated genes. I have tried GSEA and those results are quite different from the DEGs I found through limma and did not make a lot of biological sense. I think training a classifier on TCGA data with my DEGs as features and dividing the data into train/test is a great idea and I will try that out! May I ask what the difference between a validation and test set is in this situation? I assumed that I would be able to train on say 70% of the data and test on the remaining 30%
Thanks kindly for your response!