I'm trying to use biomaRt to convert a list of more than 90k probe IDs to the gene symbols, but am having problems. Using the getBM function, I can see that only 22k of those have corresponding gene symbols, but the output is a vector of length 22k, and I am unable to see the correspondence to the initial probe ID list. Additionally, I think that some of these probe IDs don't correspond to agilent probes known by biomart (using other attributes such as "chromosome_name" gives me nothing for some of the probe IDs.Using getBMlist, I can get an output with na values specified for those probes that don't match, but the function gives a warning message that getBMlist isn't for large lists, and the entire process takes too long. How do I get an output of 90k gene symbols and na values?
The probes are mostly off the Agilent-014850 Whole Human Genome Microarray 4x44K G4112F. An example name is A_23_P100001
. An example probe ID that doesn't give me any attribute in biomaRt is A_23_P116864
.
The query I'm using is as follows:
affyids = read.csv([data goes here]);
mart<- useDataset("hsapiens_gene_ensembl", useMart("ensembl"));
getBM(uniqueRows = FALSE, filters="efgagilentwholegenome4x44kv1", attributes=c("chromosome_name","start_position","external_gene_id"), values= affyids, mart=mart);
where affyids
is of type "list."
1) Why don't you download the probeId-to-gene symbol mappings from the chip manufacturer?
2) If you want people to help you, you have to be more specific. What kind of probes are you talking about, what commands/data objects did you use to do the biomaRt query?
Agreed. Maybe you can add your R code here.
dear goldexperience please don't delete questions that have answers, you are taking away other people's ability to get informed
apologies, I just reposted the question with a different format and direction.
It would be much better if you edited your original question and make it more specific. Most likely, the answer can be adapted with a few edits as well.
ok. I see that is acceptable