I have a question regarding the denovo assembly and transcripts that were obtained from the assembly.
After denovo assembly i tried to blast the transcripts to the reference transcriptome (from a different sub species) to see if i could find any novel transcripts. After that i filtering out the novel transcripts and i blasted them to reference genome (again from a different sub species) to make sure they are there in the genome. Surprisingly only 77% of those novel transcripts found hit on the genome. I have later blasted the non genome hit transcripts to related organism and all expect few have a hit there (The non hit ones are from human contamination).
My question now is what are these transcripts that doesn't have any hit either at transcriptome level or genome level but have hit to a related organism.
Thanks Dk for your quick response. I agree that the it might be a simple case of misassembling both transcriptome and genome. We plan to do the validate some of these transcripts using RT-PCR and i am wondering are there any alternative ways to validate?
Also if i understand correctly, those transcripts that does not hit both transcriptome and genome are a case of misassembly/incomplete assembly.
But what are those transcripts that does not have a hit to transcriptome? and
those transcripts that does not have hit to transcriptome but have a hit to genome?
Thanks very much in advance.....
Ok i will try to answer myself here.....
Those transcripts that does not hit to reference transcriptome are probably novel (because the transcriptome under my study condition is different from the reference transcriptome which is based on reference genome)
Those transcripts that does not hit reference transcriptome but hit the reference genome are also novel and occur due to transcriptome annotation errors .
Finally those transcripts that does not hit either to reference transcriptome or reference genome are novel and occur because of incomplete/misassemby of the genome.
What do people think?