How to search/filter GEO datasets according platform (Illumina/Affymetrix)
2
0
Entering edit mode
10.2 years ago

I am trying to find very specific GEO dataset (http://www.ncbi.nlm.nih.gov/gds) for my study. I am able to filter geo sets according organism and study type, example:

"Homo sapiens"[Organism] AND "Methylation profiling by genome tiling array"[Filter]

But how to expand my search according platform criteria, example:

"Homo sapiens"[Organism] AND "Methylation profiling by genome tiling array"[Filter] AND "Affymetrix"[Platform]

Or add tissue filter, example:

"Homo sapiens"[Organism] AND "Methylation profiling by genome tiling array"[Filter] AND "Affymetrix"[Platform] AND "brain"[Tissue]
geo • 5.0k views
ADD COMMENT
1
Entering edit mode
10.2 years ago

You might take a look at the GEOmetadb package for full-text or SQL queries of NCBI GEO metadata. NCBI GEO metadata have been parsed into a SQLite database that can be queried from R, any other language that has SQLite bindings, or using the sqlite command-line interface.

ADD COMMENT
0
Entering edit mode

Sean Davis, getSQLiteFile() returns

trying URL 'http://dl.dropbox.com/u/51653511/GEOmetadb.sqlite.gz'
Error in download.file(url_geo, destfile = localfile, mode = "wb") :
  cannot open URL 'http://dl.dropbox.com/u/51653511/GEOmetadb.sqlite.gz'

Can you please check this.

ADD REPLY
0
Entering edit mode

I don't see that (R 3.1.1, Bioconductor 2.14, GEOmetadb 1.24.0):

getSQLiteFile()
trying URL 'http://gbnci.abcc.ncifcrf.gov/geo/GEOmetadb.sqlite.gz'
Content type 'text/plain; charset=ISO-8859-1' length 230197789 bytes (219.5 Mb)
opened URL
===========
ADD REPLY
0
Entering edit mode
## for your purpose, I would try something like the following in R with GEOmetadb package :
getSQLiteFile()
file.info('GEOmetadb.sqlite')
#                        size isdir mode               mtime               ctime               atime        uid        gid   uname            grname
# GEOmetadb.sqlite 3282480128 FALSE  644 2014-09-16 11:36:22 2014-09-16 11:36:22 2014-09-16 11:35:40 1612422931 1360859114 zhujack NIH\\Domain Users

con <- dbConnect(SQLite(),'GEOmetadb.sqlite')
geo_tables <- dbListTables(con)
geo_tables
# [1] "gds"               "gds_subset"        "geoConvert"        "geodb_column_desc" "gpl"               "gse"               "gse_gpl"
# [8] "gse_gsm"           "gsm"               "metaInfo"          "sMatrix"

dbListFields(con,'gsm')
#  [1] "ID"                     "title"                  "gsm"                    "series_id"              "gpl"
#  [6] "status"                 "submission_date"        "last_update_date"       "type"                   "source_name_ch1"
# [11] "organism_ch1"           "characteristics_ch1"    "molecule_ch1"           "label_ch1"              "treatment_protocol_ch1"
# [16] "extract_protocol_ch1"   "label_protocol_ch1"     "source_name_ch2"        "organism_ch2"           "characteristics_ch2"
# [21] "molecule_ch2"           "label_ch2"              "treatment_protocol_ch2" "extract_protocol_ch2"   "label_protocol_ch2"
# [26] "hyb_protocol"           "description"            "data_processing"        "contact"                "supplementary_file"
# [31] "data_row_count"         "channel_count"

dbListFields(con,'gpl')
#  [1] "ID"                   "title"                "gpl"                  "status"               "submission_date"
#  [6] "last_update_date"     "technology"           "distribution"         "organism"             "manufacturer"
# [11] "manufacture_protocol" "coating"              "catalog_number"       "support"              "description"
# [16] "web_link"             "contact"              "data_row_count"       "supplementary_file"   "bioc_package"

## you could join different tables here and refine query with different terms
rs <- dbGetQuery(con,paste("select gpl.manufacturer,gsm.gpl,",
                            "gpl.organism,gpl.title as gpl_title,gsm,",
                            "gsm.title as gsm_title,gsm.series_id ",
                            "from gsm join gpl on gsm.gpl=gpl.gpl",
                            "where gpl.manufacturer='Affymetrix' ",
                            "and gpl.organism = 'Homo sapiens' ",
                            "and gpl.description like '%tiling%'"))

dim(rs)
# [1] 3483    7

head(rs)
#  manufacturer     gpl     organism                                                  gpl_title      gsm
# 1   Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84453
# 2   Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84454
# 3   Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84455
# 4   Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84456
# 5   Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84495
# 6   Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84496
#                                                                                                       gsm_title series_id
# 1  H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 0 hours (NCBI build 35) - strict analysis parameters   GSE3658
# 2  H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 2 hours (NCBI build 35) - strict analysis parameters   GSE3658
# 3  H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 8 hours (NCBI build 35) - strict analysis parameters   GSE3658
# 4 H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 32 hours (NCBI build 35) - strict analysis parameters   GSE3658
# 5     HisH4 ChIP from Retinoic Acid Stimulated HL60 Cells, 0 hours (NCBI build 35) - strict analysis parameters   GSE3659
# 6     HisH4 ChIP from Retinoic Acid Stimulated HL60 Cells, 2 hours (NCBI build 35) - strict analysis parameters   GSE3659

## close the connecyion
close(con)
ADD REPLY
0
Entering edit mode
10.2 years ago
Neilfws 49k

There is no specific filter for platform manufacturer, nor for tissue. The keys that you can use to filter are listed in this text file.

I'd guess that words like Affymetrix or Illumina are specific enough by themselves in most cases, without the need for qualifiers.

ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2117 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6