How Do I Use Biomart To Get Upstream Flanking Sequence For A Gene?
1
2
Entering edit mode
12.2 years ago
Arturo_M ▴ 70

Hi. I'm trying to get the 100bp upstream sequences of some genes from A. gambiae. I'm using biomaRt and my query looks like this:

vector<-useMart("vectorbase_mart_13",dataset="agambiae_eg_gene")
agambiaeseq<-getBM(attributes=c('start_position','end_position','chromosome_name','strand','ensembl_gene_id','gene_flank','upstream_flank'),filters='ensembl_gene_id',value='AGAP004677', mart=vector)

I know that for the attribute 'upstream_flank' I should put the value 100 but I just don't know where.

Thank you for your attention.

biomart bioconductor • 6.4k views
ADD COMMENT
0
Entering edit mode

I changed the title of your question to be more specific (just "Biomart" is too generic for a forum with many BioMart questions) and formatted your question so the code appears more nicely (four spaces before each code line is all that is needed). Welcome to biostar and thanks for your question! I've made an attempt at answer below.

ADD REPLY
9
Entering edit mode
12.2 years ago

From the biomaRt documentation for 'getBM' it says: "Sometimes attributes where a value needs to be specified, for example upstream_flank with value 20 for obtaining upstream sequence flank regions of length 20bp, are treated as filters in BioMarts. To enable such a query to work, one must specify the attribute as a filter and set checkFilters = FALSE for the query to work." Also note that for the 'values' argument, "If multiple filters are specified then the argument should be a list of vectors of which the position of each vector corresponds to the position of the filters in the filters argument."

So, does this do what you are looking for?

library('biomaRt')
mart=useMart("vectorbase_mart_13",dataset="agambiae_eg_gene")
agambiaeseq=getBM(attributes=c('gene_flank','start_position','end_position','chromosome_name','strand','ensembl_gene_id'),filters=c('ensembl_gene_id','upstream_flank'),values=list(ENSG='AGAP004677', Upstream=100), mart=mart, checkFilters=FALSE)

The output looks like:

gene_flank start_position end_position chromosome_name strand ensembl_gene_id
ATCTCAAAATGGCAACATGTCAAACGCTAAGAAGACACCTCTTCTATATTCCACCTTGATTTGAACGGTAACATTCAGTAGTCCGTGGCTTTCGGATTAT         157348       186936              2L     -1      AGAP004677

It seems to correspond to what I imagine your query might look like at the VectorBase Biomart web interface.

ADD COMMENT
0
Entering edit mode

You have resolved my problem, thanks a lot!

ADD REPLY
2
Entering edit mode

Great. Glad to help. If you find the forum useful, please stick around, contribute more good questions, answers ... and vote! ;-)

ADD REPLY

Login before adding your answer.

Traffic: 1606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6