I am planning to use Bioconductor GEOquery to download a couple of micro-array datasets from NCBI GEO.
Then I would like to export a subset of the metadata and the expression data to flat files that I can import elsewhere.
What I have so far is:
library(GEOquery)
library("R.utils")
geo_id <- "GSE45016"
gse <- getGEO(geo_id,GSEMatrix=FALSE)
#show metadata
Meta(gse)
#show metadata for first sample
GSMList(gse)[[1]]
#select specific field from metadata of first sample
GSMList(gse)[[1]]@header$characteristics_ch1
# Result for sample 1
[1] "tissue: normal prostate (NP) epithelial cells"
GSMList(gse)[[2]]@header$characteristics_ch1
# Result for sample 2
[1] "tissue: prostate cancer cells" "clinical stage: clinical T4N0M1"
[3] "gleason score: GS 9" "psa level: PSA 5477ng/ml"
As you can see the number of key value pairs is different for sample 1 and 2. What is would like to have is an array for every key under
@header$characteristics_ch1
and then the value or null (in case the key is missing) for every sample in the GEO dataset" ;
key_tissue: normal prostate (NP) epithelial cells\tprostate cancer cells
key_psa_level: null\tPSA 5477ng/ml
Other metadata fields like "title" luckily only have a single value beneath it.
GSMList(gse)[[1]]@header$title = "Normal prostate"
GSMList(gse)[[2]]@header$title = "High-grade PC1"
Also these I would like to have in an array for the key title.
My second question is how to export the expressions data that is stored under every sample. I would like to stream trough all the probes, get the expression values for that probe for each sample and write it to another csv file.
Hi Neilfws, How did you write this reply? first it is in the gitbub and second how to prepare them in gitbub? thanks.
Nicely done.