Hi all - I have a question regarding how to generate a "good enough" genome assembly for comparative genomics purposes (across species). For some species, the only sequencing data I have available is low-coverage (around 20X) 150bp Illumina paired reads. I do have sequencing data from two different, closely related individuals though, and several good-quality assemblies are available for closely related species. I have tried using SPades (after quality control etc), but the assembly is extremely fragmented, with a very low BUSCO score (around 20% C, 40% F), which is what one would expect given the low coverage. I could try alternative assemblers (SOAPdenovo2, Abyss, MaSuRCA etc), but have no reason to believe the results would be any better.
Is there a way to use the sequencing data from the other related individual and/or the reference sequences from closely related species to improve my assembly? The genome I want to generate an assembly for is a mollusc genome with an expected size of around 1.5Gb. I have tried to find information about reference-guided genome assembly, but nothing seems to quite fit my particular case. Unfortunately, generating better sequencing data from the species in question will not be possible, and it would be disappointing not to be able to use the data available!
Thanks very much - any help and suggestions would be appreciated
Thanks - that definitely looks like something worth looking into. Will give it a try!