What is the way of getting deferentially expressed genes from Gene expression Omnibus?
1
0
Entering edit mode
5.3 years ago

I am doing research that's why I need differentially expressed genes of some diseases. I want to know how will I get this data or are there any other tool available which can help me to get the data. Eventually, I need a beginner overview/tutorial.

Please help me and thanks in advance.

gene DEGs GEO • 2.1k views
ADD COMMENT
3
Entering edit mode
5.3 years ago

Do you not know the data that you want?; are you interested in learning how to code in R Programming Language?

To access Gene Expression Omnibus (GEO) data, a starting point could be GEO2R: About GEO2R - how to use.

Also, on each GEO accession (record) page, you will usually find a blue button, like this:

ff

If you click on that button, you will be taken taken to a new page, where you can, for example, copy R code that will permit that you can download the normalised data and perform any type of analyses that you want: Captura-de-tela-de-2019-07-27-10-36-55

There are also other GEO2R-specific functions on that page that permit you to perform differential expression analysis.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks Kevin I have learned R language. But the geo analysis with R was unknown to me. Now, it will help me a lot.

ADD REPLY
1
Entering edit mode

Great

ADD REPLY
0
Entering edit mode

Dear Kevin, Many thanks for your previous help. Actually I am doing research where DEGs are needed. I am following a research paper where the experiment was done based on this series GSE56721. I got the differentially expressed genes following your suggestion. I sorted out significant genes as they did (p-value < 0.05 and logFC >=1). But I am not getting the same result (Article: 543 DEGs, Myself: 430 DEGs). In addition, they had divided into two parts up and down-regulated genes.

I am not getting the right way of how to find this up-down regulated genes searching over the internet. Could you do me a favor, pls? Can you help me with that?

ADD REPLY
1
Entering edit mode

Hey, it is great that you have already processed the data. I would not worry too much about not getting the same results as the authors - the majority of studies are non-reproducible, in part because the complete methods are never included in manuscripts.

With regard to up- and down-regulated genes, they probably mean that they divided their differentially expressed genes into 2 groups, like this:

  1. p-value < 0.05 and logFC >=1
  2. p-value < 0.05 and logFC <=-1

You should probably be using an adjusted p-value, though.

ADD REPLY
1
Entering edit mode

Thank You so much, Kevin. You made my day. Now I understand things clearly. In future, I may need your help again.

ADD REPLY
0
Entering edit mode

Hello Kevin, hope you are doing well. Many many thanks for the last help. I need your help again. Would you please tell me if I am not able to get enough significant DEGs based on Adj. P-value then can I consider P-value for analysis and choose significantly DEGs?

If I consider is there any problem with analysis?

Let me know what you suggest. Thanks again.

ADD REPLY
1
Entering edit mode

Hello again. If you use nominal (un-adjusted) p-values, then the possibility exists that it will be more difficult to publish the work in a scientific journal.It may be better to reduce the threshold for adjusted p-value to, for example, 0.1.

ADD REPLY
1
Entering edit mode

Thanks for your valuable feedback, ok I will try my best. Many many thanks again. You helped a lot.

ADD REPLY
0
Entering edit mode

Dear Kevin, hope you are doing well. I am sorry to say that the gene counts with a threshold (Adj. P-Value < 0.1) is very low (count: 5). But when I set it to 0.5 and got data pretty well. Is this threshold ok for DEGs analysis, actually I am so confused about what to do. Suggestions from you are very much needed. If not then I have to skip the analysis.

ADD REPLY
1
Entering edit mode

A threshold of 0.5 is too high - I would not feel comfortable to use it. Perhaps you could share all of your code if you want further help?

ADD REPLY
0
Entering edit mode

Firstly, I get the r script from GEO2R and put that into the R environment and write the result in a text file and copied it into an excel sheet. So, I am doing this filtering work on the excel sheet. The logic is if adj. p-value < .5 & |logFC| true then consider it as significant. Maybe I am failing to brief the problem.

ADD REPLY
0
Entering edit mode

Showing the code for the limma part (design / contrast matrix) would help.

ADD REPLY
0
Entering edit mode

here is the script Kevin, please have a look and let me know what u think.

gset <- getGEO("GSE118370", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- "0110XXX10XXX"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# eliminate samples marked as "X"
sel <- which(sml != "X")
sml <- sml[sel]
gset <- gset[ ,sel]

# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
  (qx[6]-qx[1] > 50 && qx[2] > 0) ||
  (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=nrow(fit2))

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","logFC","Gene.symbol"))
write.table(tT, file="GSE118370_LA.txt", row.names=F, sep="\t")
ADD REPLY
1
Entering edit mode

Hey, there is one gene that passes FDR < 0.1: MME (203435_s_at).

The problem is that the study is very small, with only 6 samples (3 per group: G1 and G0).

The small sample size with always be a limitation with this stufy.

ADD REPLY
0
Entering edit mode

I got the same result. If the datasets are from different platforms, should I analyze? or what should I consider when I want to compare between two diseases datasets? If the tissues are not the same, is it valid to analyze?

Thank you so much for your valuable feedback.

ADD REPLY
1
Entering edit mode

I provide some guidance here about merging across different microarray datasets: A: How to integrate multiple data sets from microarray platform prior meta-analysis

Regarding tissues, it is impossible to say without knowing the tissues and without first analysing the data. Some tissues are 'completely' different in their transcriptional profiles, such as CNS versus skin tissues, whereas others are more similar. Any more questions, it may be better to open a new question. Thanks!

ADD REPLY
1
Entering edit mode

Hey, I can't thank you enough for all that you have done for me. I really appreciate you. You are so helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6