Question

Annotation with de novo or genome guided transcriptome assembly

1

Entering edit mode

6.0 years ago

EarlyEvol ▴ 30

Hi all,

This might be pretty inconsequential in the end but should I use a de novo or genome guided transcriptome assembly to feed into an annotation pipeline (funannotate)? It seems to me like the trade-off is accuracy vs independence of evidence. Genome guided might be more accurate, but a little redundant because RNA-seq reads are mapped to the genome to create this assembly and mapping is used directly as evidence too. De novo assembly is more error prone yet is completely independent of genome structure.

One thing that might change the answer is that i'm really interested in gene paralogs, which Trinity's genome guided approach is (reportedly) better at identifying.

This probably falls into the category of over optimization, but I would like to get someone else's take on it for sanity (and knowledge).

Thanks, Earl

RNA-Seq Assembly Trinity annotation funannotate • 2.2k views

ADD COMMENT • link updated 6.0 years ago by h.mon 35k • written 6.0 years ago by EarlyEvol ▴ 30

score 0 · Answer 1 · 2018-11-16

If you have RNAseq data and pass it to funannotate predict, it will do genome-guided assembly for you (and parse results, etc). From the manual:

funannotate train

In order to use this script you will need RNA-seq data from the genome you are annotating, if you don't have RNA-seq data then funannotate predict will train Augustus during runtime. This script is a wrapper for genome-guided Trinity RNA-seq assembly followed by PASA assembly. These methods will generate the input data to funannotate predict, i.e. coord-sorted BAM alignments, trinity transcripts, and high quality PASA GFF3 annotation. This script unfortunately has lots of dependencies that include Hisat2, Trinity, Samtools, Fasta, GMAP, Blat, MySQL, PASA, and RapMap. The $PASAHOME and $TRINITYHOME environmental variables need to be set or passed at runtime.

Thankfully, MySQL is not needed, funannotate can use SQLite.

My experience is funannotate will find more genes with the genome-guided assembly, compared to de novo assembly and mapping of the transcripts.