Good day everyone,
After de novo transcriptome assembly using RNAseq data, what would be your favorite method to find out what is behind all those Trinity transcripts ID ? Knowing that a closely related species is already (sort of) well documented on NCBI ?
Would you rather start from scratch with an annotation pipeline such as Trinotate ? Or would you rather use protein information of the closely related species and link a trinity transcript ID with something already known using blastx for example ? I can't really decide what would be the best approach myself. I have no experience in that field.
Thank you for your input ! Have a nice day,
Can you clarify your notion of
closely related
? Are we looking at strains of a species, same genus or something else.Yes sorry I did not mention ! They are the same genus indeed. Two Geckos species. Gecko Japonicus is the closest best reference we have.
Why not do both? A blastx analysis against a single species' proteome doesn't take that long. In the end, you get another column of evidence with orthologous sequences. You can give precedence to these when assigning gene names and annotations if they are good hits. But sometimes, you might miss out on something, or the orthologous gene is not correctly annotated. I always prefer to have multiple lines of evidence from Blast, InterPro, and other tools.