Pipeline for analyzing Microarray & RNA-seq GSE files from NCBI GEO
2
0
Entering edit mode
6.3 years ago
jsl ▴ 50

0 down vote favorite I have very limited experience in R but would like to know if anyone can share their pipeline for analyzing GSE files from GEO, both for microarray and/or RNA-seq. The eventual goal would be to look at the differentially expressed genes.

For example, I would like to analyze GSE113590 which is a RNA-seq data and GSE47045 which is a microarray data.

The general consensus seems to be that you download the data using this:

source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
library("GEOquery")
gset <- getGEO("GSE113590", GSEMatrix =TRUE)

But I'm not sure how to move forward from here, and there seems to be a different pipeline depending on whether it is a microarray / RNA-seq.

Thanks for your help.

RNA-Seq microarray R pipline • 4.5k views
ADD COMMENT
0
Entering edit mode

Hello junsionglow!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/51689293/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Understood, my apologies. I have taken down the post on stackoverflow.

ADD REPLY
1
Entering edit mode
6.3 years ago
h.mon 35k

The distributions of RNAseq counts and array intensities is very different, hence the need for different packages. limma is the go-to package for microarray analysis, for RNAseq counts, the main options are edgeR, DESeq2 and limma, after using the voom transformtion on the counts.

ADD COMMENT
0
Entering edit mode

I see. But I cant seem to extract the counts on the gset. I know that for microarray, one could use

gset <- getGEO("GSE47045", GSEMatrix =TRUE)
if (length(gset) > 1) idx <- grep("GPL6246", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

str(exprs(gset))

I get

 num [1:34760, 1:24] 12.85 11.2 7.58 13.42 6.72 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:34760] "10338001" "10338003" "10338004" "10338017" ...
  ..$ : chr [1:24] "GSM1143711" "GSM1143712" "GSM1143713" "GSM1143714" ..

But when it comes to RNA-seq Illumina data, the "if" command line generated NULL counts..

ADD REPLY
0
Entering edit mode

Can you provide an example of a RNAseq accession which causes trouble?

ADD REPLY
0
Entering edit mode

this is the one im trying to analyze GSE113590

ADD REPLY
0
Entering edit mode
6.3 years ago
ewre ▴ 250

Not sure if this one is helpful for you cause it deals with ArrayExpress which is EBI instead of NCBI. As I know, most of the datasets in GEO can be found in ArrayExpress.

ADD COMMENT

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6