EDIT: Now fixed, using filters = c("chromosome_name","start","end","biotype") instead. Question still stands though if anyone knows?
Original question: I've written some code which I think should take chromosomal coordinates (e.g. X:1-200000) and return protein coding genes (e.g. PLCXD1 and LINC00108) within that region on NCBI36.
I'm happy for it to return a gene only partially spanning the region or only genes contained entirely within that region at this stage, but I have no idea what it is actually returning.
Here's some simplified code of what I'm doing:
rm(list=ls())
library("biomaRt")
ensembl54=useMart("ENSEMBL_MART_ENSEMBL", host="may2009.archive.ensembl.org/biomart/martservice/", dataset="hsapiens_gene_ensembl")
chr.region = c("18:19092052-30289323","16:77462979-77500915","X:146419715-146584279","4:32776556-33393589","2:187947442-188506618")
entrez.ids=vector()
entrez.count=vector()
all.results=data.frame()
for (i in 1:length(chr.region)){
filterlist=list(chr.region[i],"protein_coding")
results=getBM(attributes = c("hgnc_symbol","entrezgene", "chromosome_name", "start_position", "end_position"),
filters = c("chromosomal_region","biotype"),
values = filterlist,
mart = ensembl54)
results$region = chr.region[i]
all.results=rbind(all.results,results)
ids=unique(results$entrezgene)
ids <- ids[!is.na(ids)]
entrez.ids[i]=paste(ids, sep=",", collapse=",")
entrez.count[i]=unique(length(ids))
}
write.csv(all.results, file="all_results.csv",row.names=F)
Many thanks for any advice.
Could you post a few lines from the csv file that you are writing ?
Hi Sudeep, here are two lines:
My question is now: if the imput region is between 19092052 and 30289323 why does the chromosomal_region filter return a gene at 75967903? Basically I am unsure what that filter is for and I'm just curious :)
Your code looks fine now that you have added the filters. Do you have any other doubt, or question?
Thanks Giovanni - I'm just wondering what the chromosomal_region filter does?