Question

Annotating an assembled GTF file

0

Entering edit mode

4.6 years ago

nattzy94 ▴ 60

I am assembling a gtf file from a bam file which I generated by aligning my rnaseq reads using STAR. Assembly was done using StringTie and the Ensembl annotation file for GRCh38.

My problem is that the resulting gtf file does not contain all the information that is in the reference annotation. Crucially, it is missing information on transcript biotype which I am interested in.

For instance the reference annotation has the following fields for a transcript:

 1       havana  exon    12975   13052   .       +       .       gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "4"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; exon_id "ENSE00001799933"; exon_version "2"; tag "basic"; transcript_support_level "NA";

However, my assembled gtf file looks like this:

1       StringTie       exon    12613   12721   1000    +       .       gene_id "MSTRG.1"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX11L1"; ref_gene_id "ENSG00000223972";

I've also tried searching the entire file for "transcript_biotype" but nothing comes up.

From this previous post, I saw that a potential fix might be to convert the gtf to bed12 and then annotate the bed12 using the Ensembl annotation file. However, I'm not sure exactly which bedtools function to use.

Would be great if anyone could point to a different solution.

RNA-Seq Assembly • 1.3k views

ADD COMMENT • link updated 4.6 years ago by PeiwenLi • 0 • written 4.6 years ago by nattzy94 ▴ 60

0

Entering edit mode

Hey, same question here. Have you solve it?

ADD REPLY • link 4.2 years ago by JRS • 0

score 0 · Answer 1 · 2020-04-27

0

Entering edit mode

4.6 years ago

PeiwenLi • 0

Hi! I am trying to do the exact same task as you and I found this post: Gene feature information missing in Stringtie merged assembly. May be helpful!

ADD COMMENT • link 4.6 years ago by PeiwenLi • 0