Hello everybody!
I just started working with data from the Affymetrix GeneChip Mouse Gene 1.0 ST Array and I have two questions about that and hope that somebody can help me.
1.The first is a more general and probably very easy question but I was wondering which IDs I should use to map Affymetrix probeset_id
s or transcript_cluster_id
s to genes. I found here a lot of questions and very good answers about that several probeset_id
s/transcript_cluster_id
s are matched with the same gene etc. I don't have a problem using different R packages (biomaRt, xmapcore, mogene10sttranscriptcluster.db etc.) to match the IDs from Affymetrix to Gene Symbol, Ensembl, Entrez Gene, Unigene IDs etc. But my question is which of these IDs I should use to determine that two (or more) transcript_cluster_id
s are matched to the same gene? In other words, what ID "type" is the standard to say that two probesets are assigned to the same gene?
I guess for most of the genes it shouldn't make a difference which ID type I use but for some probesets the annotation is different and the probesets with missing annotations are different for the different types. I saw that others used the Gene symbols (what I would have used) but I also saw the use of Unigene IDs...
I am using R and used ReadAffy to get an AffyBatch object from the cel files that I got from GEO. On GEO the platform is described as [MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array [transcript (gene) version].
From Affymetrix I downloaded two annotation files:
MoGene-1_0-st-v1.na30.mm9.transcript.csv
andMoGene-1_0-st-v1.na30.mm9.probeset.csv
I now wanted to match the data in my AffyBatch object to the
probeset_id
s in the probeset annotation file. When I tryprobeName(AffyBatch)
, I get thetranscript_cluster_id
s for each row in the intensity matrix (these are equal to theprobeset_id
s in the transcript annotation file) but not theprobeset_id
s from the probeset annotation file.Is the information about the
probeset_id
s from the probeset annotation file not stored in my AffyBatch object because the cel files are from a "transcript (gene) version" or what do I do wrong?
Thank you very much for your help!
Sandra
Hi,
I'm new in microarrays analysis and I have similar troubles with HuGene-1_0-st-v1 .CEL files and I wanna use it to do some Gene analysis and Pathways analysis. My file has the next probeset_id, and similar transcript_cluster to:
and I always see something like that:
is it because the HuGenes have this
probeset_id
? or exist some way to get the secondprobeset_id
with suffixes? because I have a big problem understanding the id's: probeset, transcript with genes or exons.Thanks in advance!!