I am trying to assemble goat genome (genome size=2.9 Gb) and I have goat genome sequencing data from short and long reads
- short read data from Illumina (genome coverage ~37x).
- long read data from PacBio (genome coverage ~1.5x)
I have assembled Illumina short reads using ABySS and SOAPdenovo and got best N50 1884 at K-mer of 41. I would like to improve short read assembly using PacBio long reads data. Because of the low coverage (1.5x genome coverage) of PacBio data, I am unable to decide which software would be best for the improvement of N50 using long reads.
I tried HybridSPADES for hybrid assembly of my short and long read data but it is giving issue regarding memory (out of memory).
Please let me know, how could I improve short read assembly using low coverage (~1,5 X coverage) long reads.
What was your input read length of the illumina data?
an optimal Kmer of 41 seems pretty low , what range did you evaluate?