Question

how to query biomaRt from a GRanges object

1

Entering edit mode

9.8 years ago

sebastianzeki0 ▴ 240

I can't quite figure this out.

I'd like to query a list of co-ordinates using biomart to find out interesting stuff about the co-ordinates and basically play around with this new toy I've discovered and see what it can do.

However I can't figure out how to add this list of genome co-ordinates (from a Grange list) to the query.

What I have so far is

library(GenomicRanges)

BEDfile = "/Normalized_countsbed.bed"

#Load the annotation and reduce it
BED <- import.gff(BEDfile, format="BED", genome="hg19", asRangedData=F)

library("biomaRt")
#List all the databases available
listMarts()
#Select the top level database
ensembl=useMart("ensembl")
#Select the dataset from ensembl database
listDatasets(ensembl)
listDatasets(ensembl)$dataset
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

filters = listFilters(ensembl)

attributes = listAttributes(ensembl)

#Perform the query using the vignette example
affyids=c("202763_at","209310_s_at","207500_at")
getBM(attributes=c('affy_hg_u133_plus_2', 'entrezgene'), filters = 'affy_hg_u133_plus_2', values = BED, mart = ensembl)

but I get the error as follows:

Error in as.vector(x, mode = "character") : no method for coercing this S4 class to a vector

What to do?

sequencing granges biomart r • 4.5k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by sebastianzeki0 ▴ 240

Ram · Answer 1 · 2015-02-24

0

Entering edit mode

9.8 years ago

Giovanni M Dall'Olio 28k

First of all, convert your GRanges object to a data frame:

> BED.df = as.data.frame(BED)
  seqnames  start    end width strand
1       16 543277 543366    90      *
2       16 770183 770277    95      *
3       16 178554 178656   103      *
4       16 439297 439397   101      *
5       16 396725 400746  4022      *
6       16 396769 399474  2706      *

Then, query biomaRt as you would normally do with a data frame:

getBM(
    attributes=c('entrezgene', 'chromosome_name', 'transcript_start', 'transcript_end'), 
    filters = c('chromosome_name', 'start', 'end' ), 
    values = with(BED.df, list(gsub("chr", "", as.character(seqnames)), start, end)), 
    mart = ensembl, 
    verbose=T
)

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

OK. That's almost there. When I run that it gives me lots of output to the console but seems to write it as an xml document e.g.

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query virtualSchemaName = 'default' uniqueRows = '1' count = '0' datasetConfigVersion = '0.6' header='0' requestid= 'biomaRt'> <Dataset name = 'hsapiens_gene_ensembl'><Attribute name = 'entrezgene'/><Attribute name = 'chromosome_name'/><Attribute name = 'transcript_start'/><Attribute name = 'transcript_end'/><Filter name = 'chromosome_name' value = 'chr1,chr1,chr1,chr1,chr1,chr1

I thought the output should be a dataframe but when I do .....

mylovelylist = getBM(
    attributes=c('entrezgene', 'chromosome_name', 'transcript_start', 'transcript_end'), 
    filters = c('chromosome_name', 'start', 'end' ), 
    values = with(BED.df, list(as.character(seqnames), start, end)), 
    mart = ensembl, 
    verbose=T
)

I get no error, but when I try to get the output from mylovelylist I get

[1] entrezgene       chromosome_name  transcript_start transcript_end
<0 rows> (or 0-length row.names).

So how do I convert this xml output to a dataframe? How come it doesn't output as a dataframe by default. I tried adding output= "data.frame" and it just gives my the error: (output = "data.frame")

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by sebastianzeki0 ▴ 240

0

Entering edit mode

Please don't answer to your own question like this - use the comments system instead. The XML output is because in the example I added the option verbose=T. Just remove it and the output will be less dense. Your final output list is empty because the chromosome names should not contain the word "chr" (e.g. see the chromosome_name field in the XML output). Try using

values = with(BED.df, list(gsub("chr", "", as.character(seqnames)), start, end))

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Giovanni M Dall'Olio 28k