Hello, I have recently generated a de novo fungal genome assembly using SPAdes. To gather some of the assembly statistics, I used QUAST Version: 5.0.2 with the following command:
quast.py --fungus -o QuastResult ../SPAdesAssembly_UK0001.fasta
I have checked the output from QUAST in the report.txt file, and I cannot work out why the contig number is smaller than the contigs >= 0 bp value.
Assembly SPAdesAssembly_UK0001
# contigs (>= 0 bp) 7261 # WHY DO THESE VALUES NOT MATCH?
# contigs (>= 1000 bp) 4364
# contigs (>= 5000 bp) 1537
# contigs (>= 10000 bp) 831
# contigs (>= 25000 bp) 306
# contigs (>= 50000 bp) 117
Total length (>= 0 bp) 36362106
Total length (>= 1000 bp) 34953349
Total length (>= 5000 bp) 28238965
Total length (>= 10000 bp) 23295690
Total length (>= 25000 bp) 15154642
Total length (>= 50000 bp) 8596018
# contigs 5580 # WHY DO THESE VALUES NOT MATCH?
Largest contig 152401
Total length 35833794
GC (%) 48.88
N50 18695
N90 2367
auN 31462.1
L50 435
L90 2695
# N's per 100 kbp 0.00
I have checked the QUAST manual and it explains that the "contigs >= x is the total number of contigs of length >= x. This metric doesn't depend on --min-contig command line parameter" and that the "number of contigs is the total number of contigs in the assembly", so surely this value should be the same for my output when comparing the total number of contigs to the total number of contigs with length >0.
When I use grep to count the contigs it returns the same value as is reported for the number of contigs with length >0.
grep -c ">" SPAdesAssembly_UK0001.fasta
SPAdesAssembly_UK0001.fasta:7261
I can't find much more detail on the difference between these two results, and why the total number of contigs it reports is less than the number of contigs with length >0. Can anyone explain why the numbers QUAST reports for me are different?
Thank you