How to retrieve the genes associated to an RNA PAXgene gene expression dataset from GEO?
1
0
Entering edit mode
5.2 years ago
Davide Chicco ▴ 120

In the past I have been working with a gene expression dataset generated with Affymetrix and I was able to use the getBM() Bioconductor function to retrieve the genes associated to it.

These are the lines of R code I used to use:

# Gene list
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)

thisAnnotLookup <- getBM(mart=mart, attributes=c("affy_hugene_1_0_st_v1", "ensembl_gene_id", "gene_biotype", "external_gene_name"), filter="affy_hugene_1_0_st_v1", values=rownames(thisGSetExprss), uniqueRows=TRUE)

And everything worked. Now I am working on another microarray dataset, generated with PAXgene, and I am trying to understand how to retrieve the genes associated to it. The platform they used is RNG-MRC_HU25k_STRASBOURG, that I have not found in BioMart.

What can I do?

Thanks!

-- Davide

EDIT: These are the fields present in my GEO variable in R

> str(gset)
Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
  ..@ experimentData   :Formal class 'MIAME' [package "Biobase"] with 13 slots
  .. .. ..@ name             : chr "Yvan,,Devaux"
  .. .. ..@ lab              : chr ""
  .. .. ..@ contact          : chr "yvan.devaux@lih.lu"
  .. .. ..@ title            : chr "Integrated Network and Microarray Analysis to Identify New Biomarkers in Ischemic Heart Disease"
  .. .. ..@ abstract         : chr "A significant proportion of acute myocardial infarction (MI) patients develop heart failure (HF). Early identif"| __truncated__
  .. .. ..@ url              : chr "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11947"
  .. .. ..@ pubMedIds        : chr "20462429\n20414696\n20300185"
  .. .. ..@ samples          : list()
  .. .. ..@ hybridizations   : list()
  .. .. ..@ normControls     : list()
  .. .. ..@ preprocessing    : list()
  .. .. ..@ other            :List of 23
  .. .. .. ..$ contact_address        : chr "120 route d'Arlon"
  .. .. .. ..$ contact_city           : chr "Luxembourg"
  .. .. .. ..$ contact_country        : chr "Luxembourg"
  .. .. .. ..$ contact_email          : chr "yvan.devaux@lih.lu"
  .. .. .. ..$ contact_institute      : chr "LIH"
  .. .. .. ..$ contact_laboratory     : chr "Cardiovascular Research Unit"
  .. .. .. ..$ contact_name           : chr "Yvan,,Devaux"
  .. .. .. ..$ contact_zip/postal_code: chr "1150"
  .. .. .. ..$ geo_accession          : chr "GSE11947"
  .. .. .. ..$ last_update_date       : chr "Mar 19 2012"
  .. .. .. ..$ overall_design         : chr "The 32 patients of this study were divided in 2 groups corresponding to the extreme quartiles of FE values. The"| __truncated__
  .. .. .. ..$ platform_id            : chr "GPL1947"
  .. .. .. ..$ platform_taxid         : chr "9606"
  .. .. .. ..$ pubmed_id              : chr "20462429\n20414696\n20300185"
  .. .. .. ..$ relation               : chr "BioProject: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA105803"
  .. .. .. ..$ sample_id              : chr "GSM302309 GSM302310 GSM302311 GSM302312 GSM302313 GSM302314 GSM302315 GSM302316 GSM302317 GSM302318 GSM302319 G"| __truncated__
  .. .. .. ..$ sample_taxid           : chr "9606"
  .. .. .. ..$ status                 : chr "Public on May 25 2010"
  .. .. .. ..$ submission_date        : chr "Jul 01 2008"
  .. .. .. ..$ summary                : chr "A significant proportion of acute myocardial infarction (MI) patients develop heart failure (HF). Early identif"| __truncated__
  .. .. .. ..$ supplementary_file     : chr "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE11nnn/GSE11947/suppl/GSE11947_RAW.tar"
  .. .. .. ..$ title                  : chr "Integrated Network and Microarray Analysis to Identify New Biomarkers in Ischemic Heart Disease"
  .. .. .. ..$ type                   : chr "Expression profiling by array"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 2
  .. .. .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ assayData        :<environment: 0x562675095e10=""> 
  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 69 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr [1:69] NA NA NA NA ...
  .. .. ..@ data             :'data.frame': 32 obs. of  69 variables:
  .. .. .. ..$ title                   : Factor w/ 32 levels "BL 708","DA 706",..: 27 26 18 6 11 28 7 2 3 17 ...
  .. .. .. ..$ geo_accession           : chr [1:32] "GSM302309" "GSM302310" "GSM302311" "GSM302312" ...
  .. .. .. ..$ status                  : Factor w/ 1 level "Public on May 25 2010": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ submission_date         : Factor w/ 1 level "Jul 01 2008": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ last_update_date        : Factor w/ 1 level "May 25 2010": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ type                    : Factor w/ 1 level "RNA": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ channel_count           : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ source_name_ch1         : Factor w/ 12 levels "BL 708","HJ687",..: 11 12 6 12 12 12 12 12 12 5 ...
  .. .. .. ..$ organism_ch1            : Factor w/ 1 level "Homo sapiens": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ characteristics_ch1     : Factor w/ 12 levels "Labeling_reference:BL 708",..: 11 12 6 12 12 12 12 12 12 5 ...
  .. .. .. ..$ characteristics_ch1.1   : Factor w/ 4 levels "Extraction_reference: PAXgene",..: 1 4 1 4 4 4 4 4 4 1 ...
  .. .. .. ..$ characteristics_ch1.2   : Factor w/ 15 levels "Sample_reference: BL 708",..: 11 13 6 14 13 15 14 15 14 5 ...
  .. .. .. ..$ characteristics_ch1.3   : Factor w/ 13 levels "Subject_reference: BL 708",..: 11 12 6 12 12 13 12 13 12 5 ...
  .. .. .. ..$ characteristics_ch1.4   : Factor w/ 4 levels "","Tissue: blood",..: 3 2 3 2 2 2 2 2 2 3 ...
  .. .. .. ..$ characteristics_ch1.5   : Factor w/ 3 levels "","Extraction_amount: 10.0",..: 3 3 3 3 3 2 3 2 3 3 ...
  .. .. .. ..$ characteristics_ch1.6   : Factor w/ 2 levels "","Extraction_amount: 10.0": 2 2 2 2 2 1 2 1 2 2 ...
  .. .. .. ..$ molecule_ch1            : Factor w/ 1 level "total RNA": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ extract_protocol_ch1    : Factor w/ 2 levels "Qiagen","Trizol": 1 2 1 2 2 2 2 2 2 1 ...
  .. .. .. ..$ label_ch1               : Factor w/ 1 level "Cy3, Cy5": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ label_protocol_ch1      : Factor w/ 1 level "Ambion": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ taxid_ch1               : Factor w/ 1 level "9606": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ source_name_ch2         : Factor w/ 22 levels "DA 706","FC 732",..: 19 16 19 4 8 17 5 1 2 19 ...
  .. .. .. ..$ organism_ch2            : Factor w/ 1 level "Homo sapiens": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ characteristics_ch2     : Factor w/ 22 levels "Labeling_reference:DA 706",..: 19 16 19 4 8 17 5 1 2 19 ...
  .. .. .. ..$ characteristics_ch2.1   : Factor w/ 3 levels "Extraction_reference: L62 VN",..: 3 2 3 2 2 2 2 2 2 3 ...
  .. .. .. ..$ characteristics_ch2.2   : Factor w/ 25 levels "Sample_reference: DA 706",..: 21 16 21 4 8 17 5 1 2 20 ...
  .. .. .. ..$ characteristics_ch2.3   : Factor w/ 23 levels "Subject_reference: DA 706",..: 19 16 19 4 8 17 5 1 2 19 ...
  .. .. .. ..$ characteristics_ch2.4   : Factor w/ 2 levels "Tissue: blood",..: 1 1 1 2 1 2 2 2 2 2 ...
  .. .. .. ..$ characteristics_ch2.5   : Factor w/ 2 levels "Extraction_amount: 10.0",..: 2 2 2 2 2 1 2 1 2 2 ...
  .. .. .. ..$ characteristics_ch2.6   : Factor w/ 2 levels "","Extraction_amount: 10.0": 2 2 2 2 2 1 2 1 2 2 ...
  .. .. .. ..$ molecule_ch2            : Factor w/ 1 level "total RNA": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ extract_protocol_ch2    : Factor w/ 2 levels "Qiagen","Trizol": 2 1 2 1 1 1 1 1 1 1 ...
  .. .. .. ..$ label_ch2               : Factor w/ 1 level "Cy3, Cy5": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ label_protocol_ch2      : Factor w/ 1 level "Ambion": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ taxid_ch2               : Factor w/ 1 level "9606": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ hyb_protocol            : Factor w/ 1 level "Agilent : 750.0 ng at 60 degree_C during 17 hours": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ scan_protocol           : Factor w/ 1 level "Scanned on an GenePix 4000B fluorescent scanner.": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ scan_protocol.1         : Factor w/ 1 level "Image intensity data were extracted with GenePix Pro 6.0 analysis software.": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ description             : Factor w/ 18 levels "ejection fraction (EF): 20",..: 18 18 17 16 16 15 15 14 14 13 ...
  .. .. .. ..$ description.1           : Factor w/ 3 levels "group:  B","group: A",..: 3 3 3 3 3 3 3 3 3 3 ...
  .. .. .. ..$ data_processing         : Factor w/ 1 level "Lowess non linear normalization": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ platform_id             : Factor w/ 1 level "GPL1947": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_name            : Factor w/ 1 level "Yvan,,Devaux": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_email           : Factor w/ 1 level "yvan.devaux@lih.lu": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_laboratory      : Factor w/ 1 level "Cardiovascular Research Unit": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_institute       : Factor w/ 1 level "LIH": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_address         : Factor w/ 1 level "120 route d'Arlon": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_city            : Factor w/ 1 level "Luxembourg": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_zip/postal_code : Factor w/ 1 level "1150": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_country         : Factor w/ 1 level "Luxembourg": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ supplementary_file      : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L29921.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ supplementary_file.1    : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L29923.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ supplementary_file.2    : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L30105.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ supplementary_file.3    : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L30107.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ data_row_count          : Factor w/ 1 level "16238": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ Extraction_amount:ch1   : chr [1:32] "10.0" "10.0" "10.0" "10.0" ...
  .. .. .. ..$ Extraction_amount:ch2   : chr [1:32] "10.0" "10.0" "10.0" "10.0" ...
  .. .. .. ..$ Extraction_reference:ch1: chr [1:32] "PAXgene" "Trizol" "PAXgene" "Trizol" ...
  .. .. .. ..$ Extraction_reference:ch2: chr [1:32] "Trizol" "PAXgene" "Trizol" "PAXgene" ...
  .. .. .. ..$ Labeling_reference:ch1  : chr [1:32] "L88-TG" "Ref" "L38 DP" "Ref" ...
  .. .. .. ..$ Labeling_reference:ch2  : chr [1:32] "Ref" "L67-SR" "Ref" "KF 692" ...
  .. .. .. ..$ RNA_quality:ch1         : chr [1:32] "null" "null" "null" "null" ...
  .. .. .. ..$ RNA_quality:ch2         : chr [1:32] "null" "null" "null" "null" ...
  .. .. .. ..$ Sample_reference:ch1    : chr [1:32] "L88-TG" "Ref" "L38 DP" "REF" ...
  .. .. .. ..$ Sample_reference:ch2    : chr [1:32] "REF" "L67-SR" "REF" "KF 692" ...
  .. .. .. ..$ Subject_reference:ch1   : chr [1:32] "L88-TG" "Ref" "L38 DP" "Ref" ...
  .. .. .. ..$ Subject_reference:ch2   : chr [1:32] "Ref" "L67-SR" "Ref" "KF 692" ...
  .. .. .. ..$ Tissue:ch1              : chr [1:32] "Blood" "blood" "Blood" "blood" ...
  .. .. .. ..$ Tissue:ch2              : chr [1:32] "blood" "blood" "blood" "Blood" ...
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 16238 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ annotation       : chr "GPL1947"
  ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 32 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. ..@ .Data:List of 4
  .. .. .. ..$ : int [1:3] 3 6 0
  .. .. .. ..$ : int [1:3] 2 44 0
  .. .. .. ..$ : int [1:3] 1 3 0
  .. .. .. ..$ : int [1:3] 1 0 0
RNA-Seq RNA geo R-language • 1.1k views
ADD COMMENT
0
Entering edit mode

Hey Davide, I never heard of that array, but perhaps you can get the annotation that you need from Here? - it's the main page for this array on GEO.

ADD REPLY
0
Entering edit mode

Thanks Kevin. I saw that page but I cannot understand how to access those data header fields. Do you know how I can do that?

ADD REPLY
0
Entering edit mode
5.1 years ago
Davide Chicco ▴ 120

I was able to solve my own problem by just checking the getGEO() function: I realized that the getGPL must be set to TRUE.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install()

BiocManager::install("GEOquery")

GSE_code <- "GSE11947"
getGEOSuppFiles(GSE_code) 
gset <- getGEO(GSE_code, GSEMatrix =TRUE, getGPL=TRUE)
ADD COMMENT

Login before adding your answer.

Traffic: 1887 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6