I download the microarray data of GSC with PTEN loss as experimental group, but my control group is RNA-seq data. How could I get differential expressed genes from those two types of data?
Thanks!
I download the microarray data of GSC with PTEN loss as experimental group, but my control group is RNA-seq data. How could I get differential expressed genes from those two types of data?
Thanks!
I'd be extremely hesitant to call differentially expressed genes based on completely different sets of data. RNA-seq directly sequences cDNA, microarrays measure the fluorescence emitted by the labeled target sequences to probe sequences spotted on the chip. They come with very different sources of bias, not to mention the batch effects of having different persons handle the cDNA extracted from different samples at different places at different times with different experimental protocols...The whole point of the control data set for calling DE genes is to estimate the baseline/background expression values. By cooking up a control that doesn't match any of the expected characteristics of your treatment sample, I don't think you're doing yourself a favor.
That being said, I don't completely understand why an appropriate control sample is missing from the original submission. GSE7562 gives plenty of replicates for PTEN loss as well as WT. (found by following GPL570 which was indicated as the original data set in the accession number you mentioned)
Is this data downloaded from public repositories like (NCBI GEO, SRA (or) Array express). In that case the control and case samples are generated at different time and conditions, so there is a high chance of batch effect. First, batch effect need to be corrected.
DO you have biological replicates for both case and control?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Normally you can use programme such as ComBat to combine RNA Seq and microarray data. However, to my knowledge, because you only have RNA Seq control and microarray cases, when you correct for the platform difference, you will also correct for any difference between case and control, therefore making it impossible to identify any differential expression. I would also be interested if there is any other methods that can make this possible though.
I am familiar with RNA-seq data but not with the microarray data and I find the document of ComBat is not detailed. The GEO data I used were some samples in GSE64985 which was normalized to obtain an integrated gene expression atlas across diverse biological sample types and conditions. What is supposed to be my input file if I use ComBat to combine RNA Seq? And what format it is in?
Thanks!
I don't know if I can scale the RNA-seq data and microarray data as GSE64985 described, "performing quantile normalization on the entire compendium using the limma R package (version 2.18.2) in order to reconcile broader differences between datasets and ensure that all arrays were on the same scale"
What you will need will be the expression level of the microarray data and the RNA Seq count data. Normalized microarray data should normally be fine. However, as I have mentioned, when you perform the correction of platform difference, you will most likely correct for the difference between the two group of samples rendering anything that you've identified as problematic.
If you really want to do it, I can try and search my scripts of ComBat for you.
Thanks for your reply! I'd appreciate it if you give me your scripts of ComBat. It will be of great help to study microarray data.
Again, I cannot stress more, this worked for me because I only add some additional microarray samples as a control data (i.e. I have case + control for my RNA Seq). So most of the true effect shouldn't be filtered in my case. However, in your case, I ain't even sure if the programme will let you run with only microarray case and RNA Seq control. You are basically using orange as control and comparing it with apple in a sense.
Thank you! May be I will give up finding differentially expressed genes between microarray data and RNA-seq data. Your script help me a lot to learn using ComBat. It may be useful next time.