Question

Ensembl BioMart - How to get Transcript IDs version increment?

2

Entering edit mode

8.8 years ago

Thomas ▴ 160

Hello,

I am trying to access information on genomic 3'UTR end and start positions using the Ensembl Biomart tool. However, there appear to be some transcripts which are labelled with the same transcript ID but nonetheless have different 3' UTR annotations.

e.g.:

Transcript ID    3' UTR start  3' UTR end
ENST00000474604  32793169      32793300
ENST00000474604  32792445      32792726
ENST00000474604  32791848      32791958
ENST00000474604  32791565      32791596
ENST00000474604  32790888      32791376

I was just wondering if there was anyway to get Biomart (or any other Ensembl tool) to print out the stable ID version increment (e.g. ENST00000474604.1, ENST00000474604.2 etc.)? Otherwise, trying to use this data it is going to be a bit of a hassle.

Also, I have downloaded the full cDNA set for Homo sapien (GRCh38) for processing and this includes the version increment - Ideally, I would like to easily map the data from the cDNA fasta file to the genomic co-ordinates obtained from BioMart

I have combed through biomart to see any option for including transcript version number but I can't seem to see anything. I am confused as to why this information would be omitted from the output

Thanks

EDIT: My question lead from an interpretation of Biomart output which was based on a misconception. See Sean Davis's post for more details

Ensembl transcript-IDs Biomart version • 3.7k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 8.8 years ago by Thomas ▴ 160

1

Entering edit mode

hi,

If you download the Gencode GTF it has the gene/transcript IDs of Ensembl and they are versioned.

Also, if you go to UCSC Tbl Brwsr, and use the All Gencode v23/ 22/ 20, you can get versioned ENST IDs. And of course use the Tbl Brwsr to just select out UTR info (as BED)

ADD REPLY • link 8.8 years ago by Amitm ★ 2.3k

0

Entering edit mode

Thank you very much Amitm, I will try what you have suggested

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by Thomas ▴ 160

Ram · Accepted Answer · 2016-01-31

4

Entering edit mode

8.8 years ago

Sean Davis 27k

I think the understanding of your results is perhaps not quite right (or I am misunderstanding your question). The five regions that you give are not for different versions of the transcript. They signify the fact that this particular transcript has five 3'-UTR exons. If you want the UTR start and end, you can take the minimum and maximum of the two columns; which is which will depend on the strand of the transcript.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by Sean Davis 27k

0

Entering edit mode

Yes, you are correct - my mistake, what you said makes a lot more sense than my interpretation

Just checking (I have an idea already), is this system the same for GTF files? i.e. are the same transcripts (with e.g. feature='three_prime_utr') listed multiple times for different exons within that transcript?

Many thanks

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by Thomas ▴ 160

1

Entering edit mode

Typically, yes.

ADD REPLY • link 8.8 years ago by Sean Davis 27k