Hello,
I would like to query biomaRt databases for retrieving Ensembl Gene IDs (ensembl_gene_stable_id
) for a list of SNPs (snp_filter
) from the user input testData$rsNum
in a tidyverse way.
testData <- readr::read_tsv("rs1467475747 8 148357
rs1378018226 8 148383
rs546813474 8 148402
rs1175049916 8 148522
rs1187272067 8 148523
rs1427441701 8 148553
rs201635470 8 148556
rs1483428031 8 148608
rs1251102826 8 148610",
col_names = c("rsNum", "chrNum", "pos"),
col_types = "cii")
I attempted to pass the filters as column names as below:
library(biomaRt)
grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", dataset="hsapiens_snp")
testData %>% getBM(attributes=c("refsnp_id", "chr_name", "chrom_start", "chrom_end",
"ensembl_gene_stable_id", "associated_gene"),
filters=c("snp_filter", "chr_name", "start", "end"),
values=list(rsNum, chrNum, pos, pos),
mart=grch37.snp, uniqueRows=TRUE)
which resulted in the error:
Error in getBM(., attributes = c("refsnp_id", "chr_name", "chrom_start", : object 'rsNum' not found
Is there any error in this approach of querying the marts?
However, I have also found the workaround to achieve the purpose in another way (source):
getBM(attributes=c("refsnp_id", "chr_name", "chrom_start", "chrom_end",
"ensembl_gene_stable_id", "associated_gene"),
filters=c("snp_filter", "chr_name", "start", "end"),
values=list(testData$rsNum, testData$chrNum, testData$pos, testData$pos),
mart=grch37.snp, uniqueRows=TRUE)
Though the later command achieves the expected output, I am looking forward to an option in the former approach by passing only the column name (rsNum, chrNum, pos, pos
). Are you aware of any possibilities?
Thanks for your interest to answer the question.