Hi! I am running SPAdes and the amount of contigs changes when I run it with the same k-mer number but at different times. For example if I run:
spades.py \
-1 $ READS1 \
-2 $ READS2 \
-s $ READS3 \
-o $ RESULTS \
-k 125 \
--careful \
--threads $ SLURM_CPUS_PER_TASK
I get 229 contigs
If I run
spades.py \
-1 $ READS1 \
-2 $ READS2 \
-s $ READS3 \
-o $ RESULTS \
-k 111,115,117,121,123,125 \
--careful \
--threads $ SLURM_CPUS_PER_TASK
I get 244 contigs in the final contigs.fasta file. When I count the contigs in the file final_contigs.fasta of each generated folders, I find that the number of contigs in the K125 folder is 244, so I deduce that the file contigs.fasta were obtained with the 125 k-mer.
Does anyone know why this is happening? Thank you!
In such case, would it not be expected that -k 111,115,117,121,123,125 obtained the best result, that is, less number of contigs than only -k125?
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original question.Smaller number of contigs doesn't necessarily mean a better assembly. As an imaginary example, wrongly joining two contigs (a mis-assembly) may decrease the number of contigs and increase N50, but - as it is incorrect - it decreases assembly quality.
There are a number of different metrics to evaluate an assembly, to get a good picture of assembly quality, it is recommended to use several of them.