biomaRt: getBM & getSequence
1
0
Entering edit mode
9.5 years ago
bsmith030465 ▴ 240

Hi,

I was trying to extract the exon sequence for ensembl transcript IDs (using GRCh37). I get somewhat perplexing results (getBM?):

myensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

eid <- "ENST00000538028"

details <- getBM(attributes = c("chromosome_name","strand","5_utr_start","5_utr_end","genomic_coding_start","genomic_coding_end",
                                        "cdna_coding_start",
                                        "cdna_coding_end","cds_start","cds_end","3_utr_start","3_utr_end"),
                         filters = "ensembl_transcript_id",value = eid,mart = myensembl)

print(details)

seq = getSequence(id=eid, type="ensembl_transcript_id", seqType="gene_exon", mart = myensembl)
show(seq)

Am I doing something wrong in either getBM and/or getSequence?

My session info is:

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] biomaRt_2.24.0       Biostrings_2.36.1    XVector_0.8.0        IRanges_2.2.1        S4Vectors_0.6.0      BiocGenerics_0.14.0  hash_2.2.6           stringr_1.0.0        foreign_0.8-63
[10] BiocInstaller_1.18.2

loaded via a namespace (and not attached):
 [1] XML_3.98-1.1         bitops_1.0-6         GenomeInfoDb_1.4.0   DBI_0.3.1            magrittr_1.5         RSQLite_1.0.0        stringi_0.4-1        zlibbioc_1.14.0      tools_3.2.0
[10] Biobase_2.28.0       RCurl_1.95-4.6       AnnotationDbi_1.30.1
biomaRt getBM getSequence bioconductor • 5.6k views
ADD COMMENT
1
Entering edit mode
9.5 years ago

There's nothing obviously wrong with anything you're doing. If you're curious why you're getting 7 sequences rather than 1, it's because gene_exon means "sequence of each exon within a gene". Perhaps you want cdna instead.

ADD COMMENT
0
Entering edit mode

Actually, I was thinking that I would get at least 7 rows from the getBM function.

ADD REPLY
0
Entering edit mode

Then you want the exon_chrom_start and exon_chrom_end attributes.

ADD REPLY
0
Entering edit mode

Got it. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6