I am working with a non-model species for which another group has published a genome assembly. The assembly is still at the level of scaffolds (N50 = 3244), and no genes have been annotated. For their study, this group used around 100,000 de novo assembled transcripts (from paired end RNA-seq) to assist in the scaffolding.
I'm working with a significantly larger data set of RNA-seq reads (also paired end, 150 bp Illumina reads) that I've assembled into a de novo transcriptome. The transcriptome itself is, needless to say, also larger and more complete (according to BUSCO
) than the one the genome study assembled.
I am wondering if I can now use this data to improve the existing genome assembly. Or would this be a fool's errand? If that is possible, could someone perhaps suggest a good pipeline or set of tools for this?
AFAIK, transcripts are not usually used for genome assembly. You can use them for gene annotation though. In any case, an assembly with N50 of 3k may be very challenging to work with, so you may consider improving it using more common ways: more sequencing data, long reads, Hi-C/optical/genetic maps etc.
Thank you for the feedback. Do you have any recommendations for tools I could use to annotate the genome as you suggested?
PASA can be a good tool to start with.
Thank you, I'll take a look at
PASA
as you suggested.