I used velvet to assemble genomic data of a plant and plotted a coverage histogram and a length weighted coverage histogram as suggested in the manual. Reads were 150 bp paired end, illumina. Various kmer values were tried and 115 was picked. What would be a good coverage cut off to use, considering that I have a small peak at 7. Please find 3 attachments. The expected coverage calculated by velvet is 23. When used with default coverage cut off (half of expected coverage), I get the following assembly:
Nodes=412915
N50= 21497
Max length= 185793
Total = 362 MB
No. of contigs = 48,614
I wanted to use a lower cut off to include the kmers in the smaller peak. Hence, I tried using a coverage cut off of 3, to get the following:
Nodes = 513117
N50= 20630
Max length =185793
Total = 384 MB
No. of contigs = 56,475
The expected genome size is 370-390 MB. Since it is expected to contain about 50-60% repeats, I do not expect the reads to cover my entire genome, which is also evident from my sam/bam files obtained by aligning reads to a closely related genome. I see that 10 MB is not covered.
Which among the two assemblies looks better??
I would definitely run more than one assembler preferably with multiple k-mer values and then compare the assemblies using QUAST.
You can also look at KAT (https://kat.readthedocs.io/en/latest/walkthrough.html#genome-assembly-analysis-using-k-mer-spectra) to assess the k-mer spectra of the reads and the k-mer spectra of the assembly. Not sure if your plant has high ploidy or not. Also important would be assessing BUSCO scores for different assemblies and perhaps RNAseq data (if available) mapping rates.
This is the best option, multiple assemblers and multiple kmers. Decreasing the coverage cut-off for contiguity doesn't help as it increases changes of erroneous overlaps.
Hello deepti1rao,
The link you’ve added points to the page that contains the image, not the image itself. On ibb.co site, right click (or Ctrl-Click on a Mac) on the image and select Copy Image Address (or an equivalent option). Use that link instead of the link you used to embed the image.
Thanks will do next time onwards.