Hello,
I am trying to access information on genomic 3'UTR end and start positions using the Ensembl Biomart tool. However, there appear to be some transcripts which are labelled with the same transcript ID but nonetheless have different 3' UTR annotations.
e.g.:
Transcript ID 3' UTR start 3' UTR end
ENST00000474604 32793169 32793300
ENST00000474604 32792445 32792726
ENST00000474604 32791848 32791958
ENST00000474604 32791565 32791596
ENST00000474604 32790888 32791376
I was just wondering if there was anyway to get Biomart (or any other Ensembl tool) to print out the stable ID version increment (e.g. ENST00000474604.1, ENST00000474604.2 etc.)? Otherwise, trying to use this data it is going to be a bit of a hassle.
Also, I have downloaded the full cDNA set for Homo sapien (GRCh38) for processing and this includes the version increment - Ideally, I would like to easily map the data from the cDNA fasta file to the genomic co-ordinates obtained from BioMart
I have combed through biomart to see any option for including transcript version number but I can't seem to see anything. I am confused as to why this information would be omitted from the output
Thanks
EDIT: My question lead from an interpretation of Biomart output which was based on a misconception. See Sean Davis's post for more details
hi,
If you download the Gencode GTF it has the gene/transcript IDs of Ensembl and they are versioned.
Also, if you go to UCSC Tbl Brwsr, and use the All Gencode v23/ 22/ 20, you can get versioned ENST IDs. And of course use the Tbl Brwsr to just select out UTR info (as BED)
Thank you very much Amitm, I will try what you have suggested