I have transcript models in GTF format made from assembling RNA-Seq read alignments using stringtie and also cufflinks. The reference genome is a new blueberry assembly I'm helping to annotate.
For the next steps, I need to find the genomic coordinates of the start and end of translation for the longest open reading in each transcript model.
Is there a program that can help me do that?
Previously, some-one in my group used a perl program called Tau to do this, but I don't have a copy of it. (Embarrassing!)
It seems like a straightforward task. Surely some-one has written something like this and put it on github or somewhere?
If not, do you think this would be a good project for a student learning bioinformatics?
Hi, I encountered a problem similar to what you described. I need to find the CDS (start/end codon) in my transcript models in GFF3 format. Have you solved it? Anyway, thanks!