I've got a list of SV calls, including breakpoints, and can easily enough winnow them down to those that are candidate gene fusions (intersect gene body or intron, same direction, strand, etc).
Now I'd like to know whether they're predicted to give in-frame or out-of-frame fusions. So far, I'm unable to find any annotation tool that can do this in a straightforward manner. Visualization would also be nice, but isn't necessary.
Any suggestions?
Thanks for the suggestion - I've got Oncofuse up and running and the output makes sense. I'm a bit concerned at how it's dropping a large number of fusion candidates, though. I already have these events mapped to Ensembl transcripts and believe them to be valid. Are there options that will allow for retaining these events? Or is there a straightforward way to replace the refseq annotations with ones from Ensembl that may be more inclusive?
Indeed, it focuses on canonical transcripts from RefSeq, one per RefSeq gene. Extending Oncofuse to isoform level and dealing with junction mapping ambiguity will definitely require a substantial re-write. Adding other genes/transcripts will require additional rounds of annotation and feature selection.
As for your original post, I believe it is not that hard to write a script that tells you if junction combines exons that are in/out of frame. Tables downloaded from UCSC GB for Ensembl genes and transcripts (Gencode V20/Ensembl 76) has exon
frames
. One has to compute exon remainders which are for 0-based coordinates(end - start + frame) % 3
and check if 5' exon remainder corresponds to 3' exon frame. Of course the hard part is to handle ambiguous cases.Thanks for the response, Mikhail. I still have a handful of transcripts for which neither program annotates the frame, so your suggestions about calculating it myself may still come in handy. Thanks for a nice package and the advice!