How sensitive is BLAST to detecting the various events that lead to isoforms of transcripts. I've had decent success in being able to capture orthologs things like nonsense mediated decay or highly truncated transcripts. However for more subtle differences or alternative splices, I am concerned that BLAST might not be sensitive enough to differentiate strongly between the two.
My concern comes from the idea of a BLAST hit, two transcripts could be identical up to a given point. BLAST could generate hits against the same regions that score the same or close enough to impact the selection of the best hit. What is also a concern is in truncations or alternative splices, where BLAST might not capture the order of splicing, making two very different transcripts appear to be almost identical.
Is BLAST sufficient for detecting orthologs of transcript isoforms, or is it going to be unreliable? I was wondering if it would be worth repeating a similar analysis but using Needleman-Wunsch to handle the alignments and scoring. The computational cost is much larger, but it might be better able to handle more conserved transcript isoforms.
EDIT: My end goal is to map orthologous transcripts of genes to sequences found in my de novo RNA-Seq data. In other words, I want to go a level deeper and not just map my RNA-Seq data to genes, but annotate for transcripts as well.
Could you expand on where you want to get to? If you BLAST human splice variants against rodent ones your HSPs and score rankings will be all over the shop, especialy since many won't be 1:1 gene orthologs anyway. Best to resolve the splices against the genes first? which is what Ensembl and UCSC do
So you're saying blast the transcripts from my de novo assembled RNA-Seq to genomic sequences? I'm not sure that something like that would be any better, the blast scores would still be all over the place.
I realize that Gene:transcript is not 1:1, but I can link genes by mapping their transcripts. If two transcripts from different species are orthologs, then the genes should be orthologs as well.
My problem isn't that I can't get enough resolution to map transcripts to a given gene, I can always choose one sequence per reference gene and work on that. However, I then start throwing out huge portions of my dataset because only one of possibly many transcripts will be mapped to a gene. My goal is to try and preserve the alternative transcript information and map transcripts to transcripts, but I am starting to doubt the ability of BLAST to accomplish this task.