Entering edit mode
4.4 years ago
xiaoyonf
▴
60
Hi, I tried to use biomart to convert affy_hugene_1_0_st_v1 probe set to gene symbol in R, using the following lines:
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
symbol <- getBM(attributes = c("affy_hugene_1_0_st_v1", "hgnc_symbol", "ensembl_gene_id"), filters ="affy_hugene_1_0_st_v1", values=rownames(exprs(gset)), mart=mart)
The probe number is 257430, but the annotated gene number is only 3881 (< 2% coverage).
I appreciate anyone can help me out!
Many thanks,
Xiaoyong
Hi Kevin,
Thank you for your prompt reply. I downloaded this dataset from GEO by (as suggested in one of your prior posts):
gset <- getGEO("GSE49124", GSEMatrix = TRUE, getGPL = FALSE)
if (length(gset) > 1) idx <- grep("GPL10739", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
Yes, the data is still at the probe set level, with a range of probe ID from 7892501 to 8180418. Could you please explain why the biomart query did not work? I will try the methods you suggested; but I have a problem to install the hugene10sttranscriptcluster.db in my R4.0.0. I will update if this works. Thanks!
I have had similar issues with this array in the past, in terms of annotation. If I recall correctly, there are both probeset and transcript cluster IDs, but the way in which they are assigned makes it difficult. This said, I have never had issues when annotating via
hugene10sttranscriptcluster.db
orhugene10stprobeset.db
- these are manually-curated database packages by James (Bioconductor).Hi Kevin,
Updates for your suggested annotation measures: I used hugene10stprobeset.db to annotate this array's probes and got almost 100% coverage. I noticed that many of the probes actually map same gene and vice versa, which is I think due to the nature of this array as an exon array. For the following DGE analysis, I used the mean value of the all assigned probes for each gene. Do you think it is OK? For the hugene10sttranscriptcluster.db annotations, oddly, it only mapped very few genes (~200), which I don't know why. Thanks again for your answer!