I used to first normalize Affymetrix microarray data with RMA by 'probeset':
oligo::rma(rawData, background=TRUE, normalize=TRUE, target="probeset")
and then convert probe ids to gene ids with:
select(microarrayPackage, keys = as.character(ids), column = c('PROBEID','ENSEMBL'), keytype='PROBEID')
But now some annotations that used to be '.db' were replaced by 'transcriptcluster.db' and 'probeset.db'. Can I run exactly same code using the 'probeset.db' ? or should I use 'transcriptcluster'?
I know there is a post on this, but still I can't understand if when running code above I should use one or the other.
I am doing this for studies on GEO, so the array platform varies.
Examples:
GSE75918 study. the array platform is [HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) version]. it works with both transcriptcluster and probeset., but with both there are many NAs. eg of ids: "7893430" "7893431" "7893432" "7893433" "7893434"
GSE63296 with array [HTA-2_0] Affymetrix Human Transcriptome Array 2.0 [transcript (gene) version]. ids: "47419722_st" "47419725_st" "47419729_st" "47419731_st"
GSE66529 study with array [HuGene-2_0-st] Affymetrix Human Gene 2.0 ST Array [transcript (gene) version. ids: "16651727" "16651729" "16651731" "16651733" "16651735"
And this last one I can't even find the package on bioconductor:
GSE55487 with array [HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array [transcript (gene) version]
Hey, thanks for sharing that. I would use the following packages for these:
hugene10sttranscriptcluster.db
----------
hta20transcriptcluster.db
----------
hugene20sttranscriptcluster.db
----------
huex10sttranscriptcluster.db
----------
A one-to-one mapping should be achievable via, for example:
thank you for answering, but meanwhile I tested how many ids were converted successfully in both probeset and transcriptcluster with:
1st example - GSE75918: hugene10stprobeset.db: 327424 ids; hugene10sttranscriptcluster.db: 215 ids
2nd example - GSE63296: hta20probeset.db: 0 ids ; hta20transcriptcluster.db: 40567 ids
3rd example - GSE66529: hugene20stprobeset.db: 376741; hugene20sttranscriptcluster.db: 0 ids
So in the 1st and 3rd examples, actually seems to be probeset that converts more ids...
It will depend on how you are summarising the data during RMA normalisation. You seem to be summarising at the level of the probe set, so, the probeset annotation will be needed.
Unless you have good justification, you should be using XYXtranscriptcluster.db with:
Please see the difference, here: C: Human Exon array probeset to gene-level expression
Doing the way you say for 1st example gives only 215 successfully converted ids of a total of 33297
ah ok, I was doing wrong. thank you!
All good then? / Tutto bene? / Todo bien? / Tudo bem?