Entering edit mode
9.9 years ago
Gjain
5.8k
Hi Everyone,
I am looking for gene annotation file in the GTF2.2 format. Specifically, I am looking for the 5UTR, CDS and 3UTR annotation for the genes.
Format needed and mentioned in the GTF2.2 readme page (shown below)
140 Twinscan 3UTR 65149 65487 . - . gene_id "140.000"; transcript_id "140.000.1";
140 Twinscan CDS 71696 71807 . - 0 gene_id "140.000"; transcript_id "140.000.1";
140 Twinscan start_codon 73222 73222 . - 2 gene_id "140.000"; transcript_id "140.000.1";
140 Twinscan CDS 73222 73222 . - 0 gene_id "140.000"; transcript_id "140.000.1";
140 Twinscan 5UTR 73223 73504 . - . gene_id "140.000"; transcript_id "140.000.1";`
I have downloaded the current gtf file from Ensembl: ftp://ftp.ensembl.org/pub/release-78/gtf/mus_musculus/
Can someone please point me in the right direction to get the above mentioned features of the genes.
Thanks in advance.
The Ensembl GTFs are compatible with GTF2.2. If you need the 3UTR and 5UTR lines then you can generate them with either Biomart or the appropriate txdb package in R and a bit of typing.
Thank you for your answer Devon.
txDB package is a good resource. I was not aware of it.
I was looking at a different place in biomart. But after you suggested, I searched again and found it using the query: