Entering edit mode
3.9 years ago
singlecell_bio
•
0
Hello, I had recently downloaded transcriptome data sets from GEO archive. I am trying to compare them to custom experimental datasets to find overlapping genes and differentially expressed genes. I previously tried using the supplementary files, the .soft files, the GEOquery package to access the transcriptome data, but none seem to have the data. I found that the .csv files had the names of all the genes and their corresponding expression levels for 0-22 days (neuroectodermal differentiation). I was not sure how to use the .csv file to compare the expression data in R. Thanks in advance.
Can you please provide an example GSE number so that I can take a look.
GSE107552 GSE103715 GSE147270
![This is how the data in the .csv file looks like: link:
https://ibb.co/Cw56j8c
Hi, those studies are:
It will be difficult to compare samples and genes across each study. For the 2 RNA-seq studies, although the data is available in FPKM expression units, batch effects will exist.
Can you elaborate more on what your ultimate goal was?
Thank you for the insight, Kevin !! The ultimate goal is to compare these datasets to another experimental transcriptome dataset (given to me by my professor). The goal is to find new candidate genes that overlap with genes expressed in ectoderm or neuroectoderm development, and establish if there are co-expressed with neural crest progenitors.
I see. You just have to be concsious about how each dataset is normalised. For example, you cannot compare RNA-seq FPKM versus RMA-normalised microarray data without first attempting to 'standardise' each dataset and deal with any batch effects.
If it's impossible to directly compare datasets, we can process them independently and 'meta-analyse' the p-values from each.
I believe It’s going to be more like a meta-analysis. The p values and the fold change data would be more than enough to identify new candidates. However, I am not sure how each of the dataset can be normalized independently before comparison. I had tried using fold change functions but was unsure if there were any other package that can normalize, find the p value, and identify the fold change for the data.