Hello all,
Newbie in microarray analysis here - I am currently trying to do some differential analysis from some microarray data (Affymetrix). I know that the probe used in the experiment was HG U95A
. I am currently trying to identify the corresponding ensembl_gene_ids for every probe id using this biomaRt code:
library(biomaRt)
#declaring hsap mart
hsap_mart = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
#extracting the gene symbols and geneIDs based on the affymetrix probe ID
affy_probe_genenames = getBM(attributes = c("ensembl_gene_id", "affy_hg_u95a"),
filters = "affy_hg_u95a", values = "1007_s_at",
mart = hsap_mart, useCache = FALSE)
I notice that for probe 1007_s_at I get the following 5 ensembl_gene_ids - "ENSG00000234078", "ENSG00000137332", "ENSG00000230456", "ENSG00000215522" and "ENSG00000204580"
Since there is only one corresponding expression value for 1007_s_at in the dataset, I was wondering how the choice is usually made on the corresponding ensembl_gene_id in (for example in this case, multiple gene ids per probe id).
All the 5 ensemble gene ids do seem to have the same gene symbol (DDR1).
Thanks in advance!
Thanks for your message! I see, the trick with the entrzgene_id does not seem to work as all genes have the same entrez id (also seen in the message from @bk11). However, the trick with contig/chromosome name seems to work as pointed out by @bk11. So the logic here is that the genes that are present on the chromosomes are preferred over the ones that are on scaffolds?