Hi!
I am trying to troubleshoot two de novo Trinity assemblies. They were sequenced during the same run for two species of sponge, and I obtained 2x150 bp reads to a depth of 124x. We already have a whole transcriptome for each species assembled, but for our purposes I would like a de novo assembly. The GC content of my new assemblies are 3-7% lower than our old assemblies. Furthermore, my assemblies have many short contigs (ie. N50: 800 bp, cf. to 1800 bp of the old assemblies, median length: 300 vs 800 bp, mean length: 600 vs 1200 bp). The nail on the coffin is that there are few reads aligned in proper paired orientation when mapped back to my de novo assemblies: ~50% in proper pairs.
I am most worried about the GC content. GC content of the reads are similar to our old transcriptomes and only lower after assembly. I have changed adapter trimming parameters and tried out the jaccard clip setting for Trinity, but my assembly stats remain almost identical each run.
Has anyone received assemblies with low GC and short contigs before? If so, what did you do to fix that?
Thanks! If there's any more information that can prove helpful, please let me know.
If you're using Trinity, with that much depth, you might want to use the in silico read normalization parameter. Also, why assemble de novo instead of reference based, if you have other assemblies? if you're looking for DEGs, combine all avalable data to create a single assembly, then align your samples back to the assembly to get abundance estimates.