Hi all,
This might be pretty inconsequential in the end but should I use a de novo or genome guided transcriptome assembly to feed into an annotation pipeline (funannotate)? It seems to me like the trade-off is accuracy vs independence of evidence. Genome guided might be more accurate, but a little redundant because RNA-seq reads are mapped to the genome to create this assembly and mapping is used directly as evidence too. De novo assembly is more error prone yet is completely independent of genome structure.
One thing that might change the answer is that i'm really interested in gene paralogs, which Trinity's genome guided approach is (reportedly) better at identifying.
This probably falls into the category of over optimization, but I would like to get someone else's take on it for sanity (and knowledge).
Thanks, Earl
Thanks for your reply.
Dang! I should have read the manual better. I guess it the "train" command does a GG assembly, that is the recommended method.
I ran Triniity and PASA separately, feeding a Trinity de novo transcript set to PASA. Then fed the the pasa gff along with the original Trinity assembly and an RNA-seq BAM to funannotate predict. This seemed to work reasonably well.
I have a genome guided assembly I could use and I'll see what differences show up.