Get Exon Sequence from biomaRt given ENST transcript ID and exon number.
1
0
Entering edit mode
20 months ago
acererak ▴ 10

Hello,

I have been trying to use biomart to extract the sequence given the Ensembl ENST transcript ID and a particular exon number. I also have the start and end position.

If I look up the given ENST on Ensembl and follow to the Exons page, I find the sequence I want but I am trying to automate this for several ENST IDs and for several exons.

Thanks.

ensembl R biomart • 1.4k views
ADD COMMENT
0
Entering edit mode

What have you tried? What errors did you get?

ADD REPLY
0
Entering edit mode

Sorry, I should have included this.

#fp_ENST_id and tp_ENST_id defined
grch37 = useEnsembl(biomart="ensembl",GRCh=37, dataset="hsapiens_gene_ensembl")

fp_exons <- biomaRt::getSequence(id = fp_ENST_id, 
                           type="ensembl_transcript_id",
                           seqType='gene_exon',
                           #seqType='cdna',
                           mart=grch37)

tp_exons <- biomaRt::getSequence(id = tp_ENST_id, 
                           type="ensembl_transcript_id",
                           seqType='gene_exon',
                           #seqType='cdna',
                           mart=grch37)

Two things from this - 1) it returns all exon sequences and I can't figure out how to specify a single exon. I don't think I can assume the row number corresponds to exon number, and also (making this assumption anyway) comparing the sequence to what I find using https://grch37.ensembl.org/index.html it does not correspond to what I am seeing in the biomaRt query (and what I know to be correct based on other data).

ADD REPLY
0
Entering edit mode
20 months ago
Ben Moore ★ 2.4k

Hi acererak,

This isn't simple to do in a single BioMart query since you can't use exon number (or 'rank') as a filter in BioMart to select specific exons within a transcript.

However, you can do this using two BioMart queries. Firstly, use the transcript stable IDs (ENST) as a filter, and select 'exon rank in transcript' and 'Exon stable ID' as attributes. From this query, you can work out the Exon stable IDs (ENSE) for your exons of interest.

Then, in a separate BioMart query, use the list of exon stable IDs (ENSE) as a filter and select 'exon sequence' as an attribute.

Depending on the size of your dataset, I would also suggest considering using the Ensembl REST API which you can use to retrieve the sequence of features using the Ensembl stable IDs, or the sequence of a region if you have the start and end position.

ADD COMMENT

Login before adding your answer.

Traffic: 1821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6