Annotation with de novo or genome guided transcriptome assembly
1
1
Entering edit mode
6.0 years ago
EarlyEvol ▴ 30

Hi all,

This might be pretty inconsequential in the end but should I use a de novo or genome guided transcriptome assembly to feed into an annotation pipeline (funannotate)? It seems to me like the trade-off is accuracy vs independence of evidence. Genome guided might be more accurate, but a little redundant because RNA-seq reads are mapped to the genome to create this assembly and mapping is used directly as evidence too. De novo assembly is more error prone yet is completely independent of genome structure.

One thing that might change the answer is that i'm really interested in gene paralogs, which Trinity's genome guided approach is (reportedly) better at identifying.

This probably falls into the category of over optimization, but I would like to get someone else's take on it for sanity (and knowledge).

Thanks, Earl

RNA-Seq Assembly Trinity annotation funannotate • 2.2k views
ADD COMMENT
0
Entering edit mode
6.0 years ago
h.mon 35k

If you have RNAseq data and pass it to funannotate predict, it will do genome-guided assembly for you (and parse results, etc). From the manual:

funannotate train

In order to use this script you will need RNA-seq data from the genome you are annotating, if you don't have RNA-seq data then funannotate predict will train Augustus during runtime. This script is a wrapper for genome-guided Trinity RNA-seq assembly followed by PASA assembly. These methods will generate the input data to funannotate predict, i.e. coord-sorted BAM alignments, trinity transcripts, and high quality PASA GFF3 annotation. This script unfortunately has lots of dependencies that include Hisat2, Trinity, Samtools, Fasta, GMAP, Blat, MySQL, PASA, and RapMap. The $PASAHOME and $TRINITYHOME environmental variables need to be set or passed at runtime.

Thankfully, MySQL is not needed, funannotate can use SQLite.

My experience is funannotate will find more genes with the genome-guided assembly, compared to de novo assembly and mapping of the transcripts.

ADD COMMENT
0
Entering edit mode

Thanks for your reply.

Dang! I should have read the manual better. I guess it the "train" command does a GG assembly, that is the recommended method.

I ran Triniity and PASA separately, feeding a Trinity de novo transcript set to PASA. Then fed the the pasa gff along with the original Trinity assembly and an RNA-seq BAM to funannotate predict. This seemed to work reasonably well.

I have a genome guided assembly I could use and I'll see what differences show up.

ADD REPLY

Login before adding your answer.

Traffic: 1520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6