Hi all,
I am looking to simulate some paired illumina data for a test. What I want to do in order of importance (most important at the top)
- Create fastq files.
- Specify specific SNPs to be in the data
- Control the allelic fractions of the spiked in SNPs
- Have an appropriate error model of illumina sequencing
- Have controllable metrics like duplicate rate, chastity fail rate
There seem to be a number of tools available for simulating illumina - do we know of one that can handle my requirements?
I checked both wgsim and sherman and I didn't see a way to spike in specific variants (base change and position). Am I missing something?
Read the entirety of my answer, I mentioned the variants explicitly.