I am doing some assembly VS read comparison, and I have noticed something which is quite confusing.
I have performed kmer extraction from an assembly file and the corresponding reads (got them from the NCBI SRA and Assembly database), and when I compare them, I have kmers which are present in the assembly but not present in the reads.
So I am wondering if this is possible, and if yes, how?
Ok yes, you are right about that. Did not think about this case.
However I assumed modern assemblers will assemble only highly covered areas (in which case) the kmers in between (so in your example ormati) should also be contained in one of the reads.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized. This belongs under @Wouter's answer.SUBMIT ANSWER
is for new answers to original question.Well I hope it does correspond to the same set of reads used for the assembly. I just searched through NCBI, lets take this one for exmple:
https://www.ncbi.nlm.nih.gov/biosample/SAMEA1705947/
and from the "Assembly" database got the .fna files, and from the SRA got the read files. Shouldn't these reads correspond to the ones used for the assembly?