Hi all,
I have a custom annotation file (gtf) that has custom annotations that were made using an older version of a genome, and I want to update those annotations so I can use them on the newest genome version. These annotations are simple single genomic ranges (no gaps) and range from ~150bp to ~250,000bp in length. I am working in the Tetrahymena thermophila model organism whose genome changes every few years as new sequencing experiments are complete.
I have started writing a custom python script to extract the sequences from the old genome, BLAT them to the new genome, and extract the new coordinates but the script is taking a long time to write and I was wondering if there was already a program/tool out there that I could use for this (what I assume is a) common task. Any suggestions?
Have you looked at RATT/PAGIT?
Thanks for this suggestion, I will keep this in mind if I need to work with a larger annotation set.
so then we're not talking 'normal protein coding' genes here, right?
How different are the genomes (or how severe are the changes)?
Correct we are not talking about 'normal protein coding'. In fact, they are not genes at all, they are sRNA precursor ranges (from which sRNA are produced).
From the answer below I saw that there were 10 annotations that resulted in a new sequence in the new genome. For some of them the strandedness of the scaffold (chromosome) had switched, and in others, there were poly N gaps that were filled in. So from just looking at my annotations, I would say that the genomes were fairly different, but I don't know where the official documentation of the changes is.