dear all,
I am try to do de-novo assembly from my RNA-seq data.. first program which I used was SoapDeNovo-Trans and it works very well and there is provide some very good statistic information about scaffold and contigs..
Right now I am try to de-novo assembly with Trinity. My output is in FASTA file and does Trinity do scaffolding? I can find just information about contig N10-50 and number of transcripts and genes..
I would like to compare result form SoapDenovo-Trans and Trinity, but not sure how to get scaffold information from trinity output..
Or if you have any other experiences how to compare output from different programs please share it with me..
Many thanks for any help..
Thank you Damian for your comment.. Do you have any experiences with robust tools for scaffolding?
There are plenty of tools for scaffolding genomes. But I don't think there are any specifically for transcriptomes. The reason is probably because most transcriptome assemblers usually output multiple isoforms of the same gene via various graph traversal algorithms. If you try to use the PE information to scaffold these transcripts, you tun the risk of fusing isoforms together.
Thank you Damian for comment and explanation. So do you recommend to keep just contigs for downstream analysis (TransDecoder etc..)?
Trinity tends to output a lot of transcripts (I easily get over 150k transcripts). What you can do is annotate your transcripts via transdecoder/blast homology/hmms and then let people who are using your transcriptome decide whether they trust the transcript or not. You can also do some kind of arbitrary scoring system where you +1 score if a transcript has evidence from one of the annotation sources, and categorize your transcripts based on this scoring system.