How to get the gene expression matrix from GEO while getGEO returns 0 features?
2
0
Entering edit mode
5.5 years ago
bioyas ▴ 20

Hi,

I would like to download the gene expression data with GEO accession number "GSE104075" from GEO repository.

here is my code:

gset <- getGEO("GSE104075",GSEMatrix =TRUE, getGPL=TRUE, AnnotGPL=TRUE)
if (length(gset) > 1) idx <- grep("GPL21298", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

ex <- exprs(gset)

str(ex)
logi[0 , 1:26] 
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:26] "GSM2789021" "GSM2789022" "GSM2789023" "GSM2789024" ..

Apparently, the submitters have not submitted the gene expression data on GEO. So I do not know how can I download this data.

Do you have any idea how can I get gene expression matrix?

GETquery ATAC-seq GEO getGEO RNA-Seq • 6.4k views
ADD COMMENT
1
Entering edit mode
5.5 years ago

That is not a microarray study, so, there is no series matrix file with expression values for getGEO to download. It is a next generation sequencing study of ATAC-seq and RNA-seq. If you want to use the data, you should check the SRA accession page: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA408158

They may only have made the raw data available, though.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks for your response. On the GEO page I can see that there is a file "GSE104075_RAW.tar" that contains the bed files. I have unzipped them and I see the files with ".bed" and ".bedGraph". Do you think I can use this files to get the gene expression data?

Do you think I can convert bed files to bam files and use feature counts to get the counts of gene expression?

Thanks

ADD REPLY
1
Entering edit mode

You should clarify what is contained within the BED and bedGraph files. Gene expression data is not typically stored in these formats. The BED and bedGraph files most likely contain the ATAC-seq data, which is typically stored in these formats.

You may also want to contact the authors directly to see if they can share the expression matrix with you. I looked briefly and could not find it.

ADD REPLY
1
Entering edit mode
5.5 years ago
ATpoint 86k

As Kevin Blighe says this is RNA-seq. It is sometimes/often not obvious what exactly these oploaded files in the RAW section are. I personally never trust them (not saying the authors are incompetent, but one simply cannot reproduce this without exact commands etc. which are often not available). The simplest is to download the raw data, see Fast download of FASTQ files from the European Nucleotide Archive (ENA) and then use a lightweight quantifier such as salmon to quantify reads against a reference transcriptome. You can then use tximport to summarize the transcript counts to the gene level. Please use google and the search function and read the manuals of the tools. Many posts on this available.

These bed files you mention are the ATAC-seq peak summits and the bedGraphs are browser tracks to visualize the RNA-seq in a genome viewer such as the IGV. None of this will reliable/meaningful give you raw gene expression counts.

ADD COMMENT

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6