How Do I Use Biomart'S Getbm On A List Of Known And Unknown Agilent Probes To Convert It To A List Of Gene Ids With Spacing For Missing Values?
2
4
Entering edit mode
11.7 years ago

I'm trying to use biomaRt to convert a list of more than 90k probe IDs to the gene symbols, but am having problems. Using the getBM function, I can see that only 22k of those have corresponding gene symbols, but the output is a vector of length 22k, and I am unable to see the correspondence to the initial probe ID list. Additionally, I think that some of these probe IDs don't correspond to agilent probes known by biomart (using other attributes such as "chromosome_name" gives me nothing for some of the probe IDs.Using getBMlist, I can get an output with na values specified for those probes that don't match, but the function gives a warning message that getBMlist isn't for large lists, and the entire process takes too long. How do I get an output of 90k gene symbols and na values?

The probes are mostly off the Agilent-014850 Whole Human Genome Microarray 4x44K G4112F. An example name is A_23_P100001. An example probe ID that doesn't give me any attribute in biomaRt is A_23_P116864.

The query I'm using is as follows:

affyids = read.csv([data goes here]);
mart<- useDataset("hsapiens_gene_ensembl", useMart("ensembl"));
getBM(uniqueRows = FALSE, filters="efgagilentwholegenome4x44kv1", attributes=c("chromosome_name","start_position","external_gene_id"), values= affyids, mart=mart);

where affyids is of type "list."

r biomart bioconductor • 13k views
ADD COMMENT
2
Entering edit mode

1) Why don't you download the probeId-to-gene symbol mappings from the chip manufacturer?

2) If you want people to help you, you have to be more specific. What kind of probes are you talking about, what commands/data objects did you use to do the biomaRt query?

ADD REPLY
0
Entering edit mode

Agreed. Maybe you can add your R code here.

ADD REPLY
0
Entering edit mode

dear goldexperience please don't delete questions that have answers, you are taking away other people's ability to get informed

ADD REPLY
0
Entering edit mode

apologies, I just reposted the question with a different format and direction.

ADD REPLY
1
Entering edit mode

It would be much better if you edited your original question and make it more specific. Most likely, the answer can be adapted with a few edits as well.

ADD REPLY
0
Entering edit mode

ok. I see that is acceptable

ADD REPLY
6
Entering edit mode
9.6 years ago

This is an old question, but I ran into this problem recently. If you have a list ids where some values are not recognized, getBM returns a list smaller than the query list. I wanted a list that included ALL of the original ids in the query order, not just the ones that mapped.

This means we want a left join between the original ids, and the query results. The merge() function can do this for us.

refSeqIds = as.matrix(c("NR_000001,"NR_000002" ... ))
​colnames(refSeqIds) = "refseq_mrna"  #name of column in results to join on
mart <- useDataset("mmusculus_gene_ensembl",useMart("ensembl"))
results <- getBM(filters="refseq_mrna",attributes=c("refseq_mrna","external_gene_name"), values=refSeqIds, mart=mart)
idmap = merge(x = refSeqIds, y = results, by="refseq_mrna",all.x=TRUE)

Output:

     refseq_mrna         external_gene_name
1    NR_000001           GeneA
2    NR_000002           <NA>
3    NR_000003           GeneB
4    NR_000004           GeneC
ADD COMMENT
0
Entering edit mode

This is the correct answer, thank you!

Here is a generic version:

getAllBM <- function(attributes, filters = '', values = '', mart, curl = NULL, checkFilters = TRUE, verbose = FALSE, uniqueRows = FALSE, bmHeader = FALSE) {
    spotty <- getBM(attributes, filters, values, mart, curl, checkFilters, verbose, uniqueRows, bmHeader)
    x <- as.data.frame(values)
    colnames(x) <- filters
    structure(merge(x = x, y = spotty, by = filters, all.x = TRUE), row.names = values)
}
ADD REPLY
4
Entering edit mode
11.7 years ago
Irsan ★ 7.8k

In R:

library(biomaRt)
mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
getBM(mart=mart,filters="efg_agilent_wholegenome_4x44k_v1",attributes=c("efg_agilent_wholegenome_4x44k_v1","external_gene_id","ensembl_gene_id","description"),values="A_32_P196615")

gives you:

efg_agilent_wholegenome_4x44k_v1 external_gene_id ensembl_gene_id description
1                     A_32_P196615     RP11-449J1.1 ENSG00000225334          NA

This retrieves gene information about one probe-id. Change values="A_32_P196615 for values=vectorWithIds to do it for multiple probes. The values argument does not have to be a list, only if you use multiple filters

ADD COMMENT
0
Entering edit mode

Thank you for your answer. I'm fairly certain that that's identical to the command I posted in my last comment; unfortunately, it doesn't provide some indication for missing probes. If I give it a list of 100 probes, 80 of which are identified, it gives me a list of 80 gene symbols without indication of the correspondence with the original 100 element list.

ADD REPLY
0
Entering edit mode

That is because your command asked for the gene symbol only, my command asked for gene symbol, description, id and the corresponding probeid. In your example (100 probes, 80 can be mapped) there are some probes that cannot be mapped to genes so biomart/ensemble will not find them either. But are you interested in the genes or in the genomic positions of the probes?

ADD REPLY
0
Entering edit mode

I'm still having issues and have updated the question accordingly. The problem seems to be that some of the IDs aren't recognized by biomaRt at all.

ADD REPLY

Login before adding your answer.

Traffic: 1851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6