Hi,
I am studying the transcriptome of Arabidopsis. Interestingly, the 5' UTR of the annotation is usually too long.
Here is an example. You can see the RNA-seq reads covers a much smaller region of the annotated 5' UTR. Is there a way to fix that? I hope to get the gtf of the shorter isoform from my RNA-seq data.
Thanks!
Yes, this has been verified in many data sets, using many different aligners. Is there a tool or method to systematically fixing this type of annotation issue (i.e. give me a new gtf with updated 5'UTR for all genes)?
It seems that you want to generate a consensus transcriptome for your samples, you can use the StringTie transcript assembler for the purpose. specifically, the option: StringTie --merge to generate a merged (consensus) gtf file from your samples. It will also generate the consensus isoforms. You should use this gtf file for further differential analysis.
Here's the link for the workflow: https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual