GEO dataset extraction
1
0
Entering edit mode
14 months ago
newtostats • 0

Hi everyone,

It is my first time using GEO datasets for a research project of mine.

Our goal is to find expression fold changes in skin diseases by looking at lesioned skin vs non-lesioned skin.

After downloaded the sequencing data, I noticed there's an ID_REF value which corresponds to a gene of interest. When I downloaded the source data to see what gene the ID_REF number corresponds to, multiple ID_REF numbers correspond to the same gene, and they’re all present in the dataset and have different expression values… not really sure why.

Any help would be greatly appreciated!

ncbi geo • 1.0k views
ADD COMMENT
1
Entering edit mode

It would help if you could share the GEO accession number of the data set in question. Kind of hard to make guesses without knowing more about the technology, the experiment, the assay, etc. represented by the GEO record.

ADD REPLY
0
Entering edit mode

thank you! I'm looking at GSE226244.

Specifically I was looking for PPARalpha expression, which gave me all of the following ID_REF numbers: 1558631_at, 1560981_a_at, 206870_at, 210771_at, 223437_at
223438_s_at, 226978_at, 237142_at, 244689_at,

ADD REPLY
0
Entering edit mode

In microarray datasets, the "ID_REF" values typically correspond to probe sets, which are short DNA sequences designed to target specific genes/transcripts. However, multiple probe sets can target the same gene.

ADD REPLY
0
Entering edit mode
14 months ago
seidel 11k

I'm looking at GSE226244

Specifically I was looking for PPARalpha expression, which gave me all of the following ID_REF numbers: 1558631_at, > 1560981_a_at, 206870_at, 210771_at, 223437_at
223438_s_at, 226978_at, 237142_at, 244689_at,

Those are probe IDs for the Affymetrix Human Genome U133 Plus 2.0 Array (HG-U133_Plus_2). There's a link on the GEO Record pointing back to the affymetrix site which may have more information. They're owned by ThermoFisher now, but there used to be something called NetAffx that gave information about probe mappings. It looks like there's still some kind of NetAffx stuff available there.

As Hamid Ghaedi Ghaedi mentioned affy uses probe sets to target a gene (typically 11 different probes, along with a matching set of Mismatch probes where 1 base is different). The design targets were originally cDNAs which represent transcripts which may or may not originate from the same locus. The probe sets are also named according to how likely they are to cross-hybridize with other design target (notice the _at, _s_at, _a_at, _x_at, etc.). As genome annotations change or improve, the probe set mappings may change, which is why NetAffx is supposed to exist - so you can know which genes may be represented by the array results.

DIfferent people have different schemes for how to deal with multiple probes mapping to a gene. Some use averaging, others allow any of the matching probe sets to represent potential activity of a gene, and work out cross hubridization issues downstream.

Another place you can find affy ID to gene or transcript mappings is at Ensembl using biomart.

ADD COMMENT

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6