(Note that this post originated here, and is now being moved over to the dedicated Oncofuse thread.) I've pasted below a case where Oncofuse (1.0.9b1) seems to get confused between a fusion and its reciprocal:
fusions.out 3378 EPI 5 0 chr1:16299533>chr1:16456083 ZBTB17 No Intron 14 24541 804 0 EPHA2 Yes Exon 2 155 891 1 1 0.005662748906153429 0.9999657760410134 -0.1151861020059215 BTB/POZ-like[Domain];BTB/POZ[Domain];Zinc finger C2H2-type/integrase DNA-binding domain[Domain];BTB/POZ fold[Domain];Zinc finger, C2H2-like[Domain];Zinc finger, C2H2[Domain] Serine-threonine/tyrosine-protein kinase catalytic domain[Domain];Sterile alpha motif, type 1[Domain];Immunoglobulin-like fold[Domain];Protein kinase, ATP binding site[Binding_site];Sterile alpha motif/pointed domain[Domain];Ephrin receptor ligand binding domain[Domain];Ephrin receptor, transmembrane domain[Domain];Tyrosine-protein kinase, active site[Active_site];Insulin-like growth factor binding protein, N-terminal[Domain];Protein kinase-like domain[Domain];Sterile alpha motif domain[Domain];Galactose-binding domain-like[Domain];Protein kinase domain[Domain];Tyrosine-protein kinase, catalytic domain[Domain];Ephrin receptor type-A /type-B[Family];Tyrosine-protein kinase, receptor class V, conserved site[Conserved_site];Fibronectin, type III[Domain] EPHA2;PIK3R1;SLA;EFNA1;EFNA4 0.017305156526071477 0.0 3.7249030480169165E-5 0.08791482785307511 0.0187250351050042 0.09058808633418107
In this sample output, ZBTB17 is given as the 5' partner and EPHA2 as the 3' partner, with the following breakpoint: chr1:16299533>chr1:16456083 (I have confirmed that this orientation is correct based on manual inspection of the alignment). The first problem is that the 5_Segment_ID (14) and 3_Segment_ID (2) do not seem to be aware that both genes are on the minus strand and hence the exon number should rather be 2 (ZBTB17) and 16 (EPHA2). This incorrect choice of exons seems to be reflected in the list of protein domains retained for the two fusion partners. Since the 5' breakpoint is in the 5'UTR of ZBTB17, it shouldn't retain any domains, and a quick inspection of the domain architecture of EPHA2 indicates that only possibly the Sterile-Alpha Motif should be retained given that only the last ~100 amino acids are retained in the fusion. That this record reports such a confident prediction of "driver" (>0.9999) seems to be attributable to the list of retained domains, which now seems highly questionable. This might be reasonable for the reciprocal fusion, in which EPHA2 is the 5' partner and ZBTB17 is the 3', but that should not be the case for this record. Have I misinterpreted the output?
Note that I am running Oncofuse with input_type "tophat".
Hello, Mike. I've been using Oncofuse on Tophat-fusion-post output and I'm excited about the results. I had a couple of questions about the criteria for judging reported fusion events and sifting out false positives, and I was wondering if you could help or point me in the right direction.
I'm finding that a really high percentage (easily 75% or more) of the reported events have one or both partners 'backwards', resulting in a head-to-head or tail-to-tail fusion transcript that's probably riddled with stop codons and often even missing a start:
Oncofuse (and for that matter Tophat) doesn't look for or report 'backwards' reading when scoring functional domains/driver probability/etc of fusion partners. I know of one example of a head-to-head fusion being functional, but it seems like it would be extremely rare for this kind of event to even give a productive transcript. Am I wrong in that impression? Have you thought about adding something to Oncofuse to filter or flag these by comparing orientation to strand and order of the partners?
Kind of similarly, most of my list has one or both partners with an intronic breakpoint, often 100s or 1000s of BP away from the exon boundaries. Again, it seems like even if these events are real readthroughs or something, they would almost never produce a functional transcript. I've been checking them anyways, but I'm wondering if it would be more appropriate to just filter out using Oncofuse?
Thanks!
Hello!
Sorry for quite late reply. Those are very interesting suggestions and here is what I think on those questions:
Could you please clarify if those are genes are from the same chromosome looking head-to-hand/tail-to-tail, or this is seen reads containing fusion junction? If this is the former case that this could easily happen with a process called chromosomal inversion which is relatively common mutation. As for the latter case, I've implemented this kind of filtering, see latest pre-release here. Please let me know if it works fine for your task.
As a follower of a theory that breakpoints happen at random (e.g. see this paper) and then oncogenic ones are selected based on their driver potential, I think that one would expect lots of them in introns. And as even with best RNA-seq one gets lot of intron coverage, those intronic breakpoints could actually be true events, present in unspliced mRNA. So I don't think filtering them would be a very good idea, and if they manage to get in the final list of fusion candidates some manual verification would be needed.
Hey!
You're quite right, and I should have specified that it was the latter. I tried the new version and the results look excellent - all of the ones I've checked have had both partners in 'correct' orientation. I was off a bit, the 'backwards' candidates were 2/3s rather than 3/4ths of the reported events. Thank you so much for writing this, it's a massive time-saver over checking them manually and I'm going to recommend this to some people.
I see your point. I'd wondered if prespliced mRNA contributed significantly to reads for reported fusions, so that's good to know. Rechecking the candidates with intronic breakpoints that I was filtering before, I've already found a couple with promising partners. I've noticed in some papers (like this one) they do filter to purely exonic reads - do you think this is just time-saving since they're handling so much data?
Thanks again :)
Great!
As for the Nat Biotech paper I agree with you. I don't its possible to validate data from 675 cell lines, so they've taken this as a precaution step, together with only selecting discordant paired alignments that map to different chromosomes, etc. Btw thanks for sharing this paper, the data is great for benchmarking tools like Oncofuse based on fusion recurrence criteria.
Hello!
Finally managed to pack and upload a new version that supports Tophat-post. Please check the website mentioned above.
Regards,
Mike