trouble getting gene names from biomaRt
1
0
Entering edit mode
19 months ago
Barista ▴ 10

I have an excel file, which contains columns chrom, pos, id, ref and alt. I want to add a new column, which will have the name of the genes for the corresponding rows.

For that I am using getBM() function in biomaRt, but it takes too much time to finish. I realize that it may be slow, due to the fact that my dataset contains 500,000 rows, but now it has been over an hour and it still did not finish this function.

This is how I do it:

options(max.print=1000000)
library(readxl)
library(dplyr)

vcf_data <- read_excel("/Users/.../rows.xlsx", col_names = TRUE)
vcf_data <- dplyr::rename(vcf_data, chrom = chrom, pos = pos, id = id, ref = ref, alt = alt)
vcf_data <- dplyr::select(vcf_data, chrom, pos, id, ref, alt)
vcf_data <- vcf_data[!grepl("^ns", vcf_data$id), ]

library(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
gene_names <- getBM(attributes = c("hgnc_symbol"), 
                    filters = c("chromosome_name", "start", "end"), 
                    values = list(vcf_data$chrom, vcf_data$pos, vcf_data$pos), 
                    mart = mart)

merged_data <- merge(vcf_data, gene_names, 
                     by.x = c("chrom", "pos"), 
                     by.y = c("Chromosome", "Start"))

write.xlsx(merged_data, "/Users/.../fileWithGeneNames.xlsx", row.names = FALSE)

Is there a better way to do it? This is my first time using biomaRt, so I might have done something wrong.

biomaRt R • 937 views
ADD COMMENT
2
Entering edit mode
19 months ago
Emily 24k

biomaRt cannot handle queries this size. Use the VEP.

ADD COMMENT
0
Entering edit mode

Thank you so much! I will try this out right now. In case of any further problems doing this using VEP, can I add a new comment here and count on your help? I would appreciate this a lot!! :)

ADD REPLY
0

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6