I'll be giving a course about NGS and I'd like the students hunting for some well defined non-synonymous variations in the domain of a given protein .
My 1st idea : duplicating one chromosome , inserting a mutation and let samtools wgsim generating the (heterozygous) mutations.
2nd idea: changing the source code of wgsim . However it's not clear to me where I should put my snippet of C code to insert the mutation; Do you have any idea ? Thanks.
If you prefer to modify the source code, see the "wgsim_print_mutref()" function. It takes two haploid sequences "hap1" and "hap2" (more exactly, haploid differences from the reference) as input and prints out all the differences from the reference. The lower two bits keep the actual sequence. Once you understand how that works, it is easy to generate these sequences.
BTW, an updated version is here, with some bug fixes.
I also want to do same job as your 1st idea. The step is,
1) Create two sequences of the diploid.
2) Simulate het and homo variants in these two sequences.
3) Simalate sequencing reads
Did you solve this problem? or do you konw how to solve it?
seems simplest to either modify the reference or get the reads from some publication that found such a variant.