In the past I have been working with a gene expression dataset generated with Affymetrix and I was able to use the getBM() Bioconductor function to retrieve the genes associated to it.
These are the lines of R code I used to use:
# Gene list
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
thisAnnotLookup <- getBM(mart=mart, attributes=c("affy_hugene_1_0_st_v1", "ensembl_gene_id", "gene_biotype", "external_gene_name"), filter="affy_hugene_1_0_st_v1", values=rownames(thisGSetExprss), uniqueRows=TRUE)
And everything worked. Now I am working on another microarray dataset, generated with PAXgene, and I am trying to understand how to retrieve the genes associated to it. The platform they used is RNG-MRC_HU25k_STRASBOURG, that I have not found in BioMart.
What can I do?
Thanks!
-- Davide
EDIT: These are the fields present in my GEO variable in R
> str(gset)
Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
..@ experimentData :Formal class 'MIAME' [package "Biobase"] with 13 slots
.. .. ..@ name : chr "Yvan,,Devaux"
.. .. ..@ lab : chr ""
.. .. ..@ contact : chr "yvan.devaux@lih.lu"
.. .. ..@ title : chr "Integrated Network and Microarray Analysis to Identify New Biomarkers in Ischemic Heart Disease"
.. .. ..@ abstract : chr "A significant proportion of acute myocardial infarction (MI) patients develop heart failure (HF). Early identif"| __truncated__
.. .. ..@ url : chr "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11947"
.. .. ..@ pubMedIds : chr "20462429\n20414696\n20300185"
.. .. ..@ samples : list()
.. .. ..@ hybridizations : list()
.. .. ..@ normControls : list()
.. .. ..@ preprocessing : list()
.. .. ..@ other :List of 23
.. .. .. ..$ contact_address : chr "120 route d'Arlon"
.. .. .. ..$ contact_city : chr "Luxembourg"
.. .. .. ..$ contact_country : chr "Luxembourg"
.. .. .. ..$ contact_email : chr "yvan.devaux@lih.lu"
.. .. .. ..$ contact_institute : chr "LIH"
.. .. .. ..$ contact_laboratory : chr "Cardiovascular Research Unit"
.. .. .. ..$ contact_name : chr "Yvan,,Devaux"
.. .. .. ..$ contact_zip/postal_code: chr "1150"
.. .. .. ..$ geo_accession : chr "GSE11947"
.. .. .. ..$ last_update_date : chr "Mar 19 2012"
.. .. .. ..$ overall_design : chr "The 32 patients of this study were divided in 2 groups corresponding to the extreme quartiles of FE values. The"| __truncated__
.. .. .. ..$ platform_id : chr "GPL1947"
.. .. .. ..$ platform_taxid : chr "9606"
.. .. .. ..$ pubmed_id : chr "20462429\n20414696\n20300185"
.. .. .. ..$ relation : chr "BioProject: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA105803"
.. .. .. ..$ sample_id : chr "GSM302309 GSM302310 GSM302311 GSM302312 GSM302313 GSM302314 GSM302315 GSM302316 GSM302317 GSM302318 GSM302319 G"| __truncated__
.. .. .. ..$ sample_taxid : chr "9606"
.. .. .. ..$ status : chr "Public on May 25 2010"
.. .. .. ..$ submission_date : chr "Jul 01 2008"
.. .. .. ..$ summary : chr "A significant proportion of acute myocardial infarction (MI) patients develop heart failure (HF). Early identif"| __truncated__
.. .. .. ..$ supplementary_file : chr "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE11nnn/GSE11947/suppl/GSE11947_RAW.tar"
.. .. .. ..$ title : chr "Integrated Network and Microarray Analysis to Identify New Biomarkers in Ischemic Heart Disease"
.. .. .. ..$ type : chr "Expression profiling by array"
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 2
.. .. .. .. .. ..$ : int [1:3] 1 0 0
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ assayData :<environment: 0x562675095e10="">
..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 69 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr [1:69] NA NA NA NA ...
.. .. ..@ data :'data.frame': 32 obs. of 69 variables:
.. .. .. ..$ title : Factor w/ 32 levels "BL 708","DA 706",..: 27 26 18 6 11 28 7 2 3 17 ...
.. .. .. ..$ geo_accession : chr [1:32] "GSM302309" "GSM302310" "GSM302311" "GSM302312" ...
.. .. .. ..$ status : Factor w/ 1 level "Public on May 25 2010": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ submission_date : Factor w/ 1 level "Jul 01 2008": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ last_update_date : Factor w/ 1 level "May 25 2010": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ type : Factor w/ 1 level "RNA": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ channel_count : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ source_name_ch1 : Factor w/ 12 levels "BL 708","HJ687",..: 11 12 6 12 12 12 12 12 12 5 ...
.. .. .. ..$ organism_ch1 : Factor w/ 1 level "Homo sapiens": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ characteristics_ch1 : Factor w/ 12 levels "Labeling_reference:BL 708",..: 11 12 6 12 12 12 12 12 12 5 ...
.. .. .. ..$ characteristics_ch1.1 : Factor w/ 4 levels "Extraction_reference: PAXgene",..: 1 4 1 4 4 4 4 4 4 1 ...
.. .. .. ..$ characteristics_ch1.2 : Factor w/ 15 levels "Sample_reference: BL 708",..: 11 13 6 14 13 15 14 15 14 5 ...
.. .. .. ..$ characteristics_ch1.3 : Factor w/ 13 levels "Subject_reference: BL 708",..: 11 12 6 12 12 13 12 13 12 5 ...
.. .. .. ..$ characteristics_ch1.4 : Factor w/ 4 levels "","Tissue: blood",..: 3 2 3 2 2 2 2 2 2 3 ...
.. .. .. ..$ characteristics_ch1.5 : Factor w/ 3 levels "","Extraction_amount: 10.0",..: 3 3 3 3 3 2 3 2 3 3 ...
.. .. .. ..$ characteristics_ch1.6 : Factor w/ 2 levels "","Extraction_amount: 10.0": 2 2 2 2 2 1 2 1 2 2 ...
.. .. .. ..$ molecule_ch1 : Factor w/ 1 level "total RNA": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ extract_protocol_ch1 : Factor w/ 2 levels "Qiagen","Trizol": 1 2 1 2 2 2 2 2 2 1 ...
.. .. .. ..$ label_ch1 : Factor w/ 1 level "Cy3, Cy5": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ label_protocol_ch1 : Factor w/ 1 level "Ambion": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ taxid_ch1 : Factor w/ 1 level "9606": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ source_name_ch2 : Factor w/ 22 levels "DA 706","FC 732",..: 19 16 19 4 8 17 5 1 2 19 ...
.. .. .. ..$ organism_ch2 : Factor w/ 1 level "Homo sapiens": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ characteristics_ch2 : Factor w/ 22 levels "Labeling_reference:DA 706",..: 19 16 19 4 8 17 5 1 2 19 ...
.. .. .. ..$ characteristics_ch2.1 : Factor w/ 3 levels "Extraction_reference: L62 VN",..: 3 2 3 2 2 2 2 2 2 3 ...
.. .. .. ..$ characteristics_ch2.2 : Factor w/ 25 levels "Sample_reference: DA 706",..: 21 16 21 4 8 17 5 1 2 20 ...
.. .. .. ..$ characteristics_ch2.3 : Factor w/ 23 levels "Subject_reference: DA 706",..: 19 16 19 4 8 17 5 1 2 19 ...
.. .. .. ..$ characteristics_ch2.4 : Factor w/ 2 levels "Tissue: blood",..: 1 1 1 2 1 2 2 2 2 2 ...
.. .. .. ..$ characteristics_ch2.5 : Factor w/ 2 levels "Extraction_amount: 10.0",..: 2 2 2 2 2 1 2 1 2 2 ...
.. .. .. ..$ characteristics_ch2.6 : Factor w/ 2 levels "","Extraction_amount: 10.0": 2 2 2 2 2 1 2 1 2 2 ...
.. .. .. ..$ molecule_ch2 : Factor w/ 1 level "total RNA": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ extract_protocol_ch2 : Factor w/ 2 levels "Qiagen","Trizol": 2 1 2 1 1 1 1 1 1 1 ...
.. .. .. ..$ label_ch2 : Factor w/ 1 level "Cy3, Cy5": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ label_protocol_ch2 : Factor w/ 1 level "Ambion": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ taxid_ch2 : Factor w/ 1 level "9606": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ hyb_protocol : Factor w/ 1 level "Agilent : 750.0 ng at 60 degree_C during 17 hours": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ scan_protocol : Factor w/ 1 level "Scanned on an GenePix 4000B fluorescent scanner.": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ scan_protocol.1 : Factor w/ 1 level "Image intensity data were extracted with GenePix Pro 6.0 analysis software.": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ description : Factor w/ 18 levels "ejection fraction (EF): 20",..: 18 18 17 16 16 15 15 14 14 13 ...
.. .. .. ..$ description.1 : Factor w/ 3 levels "group: B","group: A",..: 3 3 3 3 3 3 3 3 3 3 ...
.. .. .. ..$ data_processing : Factor w/ 1 level "Lowess non linear normalization": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ platform_id : Factor w/ 1 level "GPL1947": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_name : Factor w/ 1 level "Yvan,,Devaux": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_email : Factor w/ 1 level "yvan.devaux@lih.lu": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_laboratory : Factor w/ 1 level "Cardiovascular Research Unit": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_institute : Factor w/ 1 level "LIH": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_address : Factor w/ 1 level "120 route d'Arlon": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_city : Factor w/ 1 level "Luxembourg": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_zip/postal_code : Factor w/ 1 level "1150": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ contact_country : Factor w/ 1 level "Luxembourg": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ supplementary_file : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L29921.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
.. .. .. ..$ supplementary_file.1 : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L29923.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
.. .. .. ..$ supplementary_file.2 : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L30105.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
.. .. .. ..$ supplementary_file.3 : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L30107.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
.. .. .. ..$ data_row_count : Factor w/ 1 level "16238": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ Extraction_amount:ch1 : chr [1:32] "10.0" "10.0" "10.0" "10.0" ...
.. .. .. ..$ Extraction_amount:ch2 : chr [1:32] "10.0" "10.0" "10.0" "10.0" ...
.. .. .. ..$ Extraction_reference:ch1: chr [1:32] "PAXgene" "Trizol" "PAXgene" "Trizol" ...
.. .. .. ..$ Extraction_reference:ch2: chr [1:32] "Trizol" "PAXgene" "Trizol" "PAXgene" ...
.. .. .. ..$ Labeling_reference:ch1 : chr [1:32] "L88-TG" "Ref" "L38 DP" "Ref" ...
.. .. .. ..$ Labeling_reference:ch2 : chr [1:32] "Ref" "L67-SR" "Ref" "KF 692" ...
.. .. .. ..$ RNA_quality:ch1 : chr [1:32] "null" "null" "null" "null" ...
.. .. .. ..$ RNA_quality:ch2 : chr [1:32] "null" "null" "null" "null" ...
.. .. .. ..$ Sample_reference:ch1 : chr [1:32] "L88-TG" "Ref" "L38 DP" "REF" ...
.. .. .. ..$ Sample_reference:ch2 : chr [1:32] "REF" "L67-SR" "REF" "KF 692" ...
.. .. .. ..$ Subject_reference:ch1 : chr [1:32] "L88-TG" "Ref" "L38 DP" "Ref" ...
.. .. .. ..$ Subject_reference:ch2 : chr [1:32] "Ref" "L67-SR" "Ref" "KF 692" ...
.. .. .. ..$ Tissue:ch1 : chr [1:32] "Blood" "blood" "Blood" "blood" ...
.. .. .. ..$ Tissue:ch2 : chr [1:32] "blood" "blood" "blood" "Blood" ...
.. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 1
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ featureData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr(0)
.. .. ..@ data :'data.frame': 16238 obs. of 0 variables
.. .. ..@ dimLabels : chr [1:2] "featureNames" "featureColumns"
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 1
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ annotation : chr "GPL1947"
..@ protocolData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr(0)
.. .. ..@ data :'data.frame': 32 obs. of 0 variables
.. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 1
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. ..@ .Data:List of 4
.. .. .. ..$ : int [1:3] 3 6 0
.. .. .. ..$ : int [1:3] 2 44 0
.. .. .. ..$ : int [1:3] 1 3 0
.. .. .. ..$ : int [1:3] 1 0 0
Hey Davide, I never heard of that array, but perhaps you can get the annotation that you need from Here? - it's the main page for this array on GEO.
Thanks Kevin. I saw that page but I cannot understand how to access those data header fields. Do you know how I can do that?