I have transcript assembly and I have tried to annotate to human Ensembl cDNA through tblastx. I got blast top hits in Ensembl transcript id. I have converted these top hits Ensembl transcript id to Ensembl Gene id through Biomart tool. But now I have many transcripts match to same gene. If I have to consider one transcript for each gene what criteria can I choose? Whether based on transcript length or any other criteria?
For eg
Ensembl Gene ID Ensembl Transcript ID ENSG00000139618 ENST00000380152 ENSG00000139618 ENST00000530893 ENSG00000139618 ENST00000528762 ENSG00000139618 ENST00000470094 ENSG00000139618 ENST00000533776 ENSG00000139618 ENST00000544455
In the above example which Ensembl transcript id can I choose for the gene ENSG00000139618?
The answer to your question lies in what you need to accomplish. Just think of it this way: if you selected one at random would that be acceptable? If not then why?, With that you have the rule by which to select your transcript.
why not simply keep the best hit from Blast? For each sequence you can assign only the best Ensemble transcript ID
Thanks for reply. How can I choose representative transcript for a gene. So, can I select one at random? or should go for selecting best transcript for a gene based on transcript length, no.of exons predicted for that transcript?