As part of MSMC, you need a mappability mask. However, for non-model organisms, you'll likely have to generate the masks yourself. I am using 3 spined sticklebacks, and my sequencing for each individual is comprised of 3 different libraries; 100bp reads with 140bp and 300bp insert sizes, and 50bp reads with 3kb insert sizes.
The program SNPable is conceptualised with single-end reads in mind, so deciding on which size k-mer to use is difficult. A guide I read used 250-mers for a single paired end library, though they didn't state the size of the reads nor the insert.
My question is simple, what do I need to consider when deciding what size k-mer to use? The mate pair library makes this particularly difficult, or so I have been led to believe at least. Any help would be greatly appreciated.
Hi! Did you find a solution to this problem and would mind sharing it?
Hey, firstly I tested a few different k-mers to see the effect of changing size. But after looking through other papers with data I thought was similar I settled on k=100. You can check the preprint here.
Thanks a lot for sharing! Was the effect of different k-mer sizes huge?
Hey, I cannot remember and all those files are compressed on a backup server. If you have access to a HPC, it would be quite simple to test a series of sizes and have a look yourself. It will be different for each species, depending on things like repetitive content. Apologies I cannot be more helpful, but it was 3 years ago.
Hi, don't worry! Thanks a lot for your response, it helped me for sure.