Can someone tell me how to distinguish chimeric transcript from fusion gene? I am very confused.
I have recently made a transcriptome assembly from Vevelt/Oases pipeline and blasted it against Ref genome and i found that some of the transcripts were blasted on two different chromosomes and i also found that some of the transcripts blasted on the same chromosome but at different locations. Now i am wondering if i need to keep these chimeric transcripts in my final assembly or throw them away?
Thanks Upendra
Can you clarify what you mean by a "fusion gene"? Searching for chimeras is fairly straightforward. There could be many different reasons why your reads are hitting different locations.
By fusion gene i mean transcripts originating from two different parts of the genes either by translocation, deletion etc., This is biological. What i am worried is if i throw away the chimera transcripts i would be throwing away biologically interesting genes. On the other hand if i keep those chimera transcripts i am worried that it will be cause major artefacts in analysis following a transcriptome assembly, like detection of sequence or expression variation.
Probably splitting the problem in two and solving each one separately would bring the best results!
1) Use first Velvet/Oases and throw away those transcripts which map on two different chromosomes (most likely due to assembly errors there are many false positives fusion genes)
2) For finding fusion genes use specificaly design tools like for example FusionCatcher http://code.google.com/p/fusioncatcher/ (it has very good sensivity and specificity for finding fusion genes)
3) use the results from step 1 and step 2 together
This is common to de novo assembly. For a large genome, most of chimeric contigs are caused by misassembly (including misassembly in the reference genome) instead of anything biological.
I agree with you regarding reference genome. How about denovo assembly? How about transcriptome assembly? Keep or not keep?