Is there any way to simulate reads for any amount of positions in a vcf file? I found a GATK3 tool to simulate those mentioned in this thread vcf to bam by Pierre Lindenbaum but I was unable to find the correct GATK legacy version which contains this tool. The GATK tool seems to be a little bit more sophisticated since it takes a user-defined read-error into account. But correct me if I'm wrong here and every other tool posted in this thread does this as well.
Could anyone also outline how accurate this bam file will be if one uses it to test a variant-calling pipeline?
Best regards,
Berndmann
Do you need to use a VCF as starting point?
If not you could use
mutate.sh
to introduce mutations in a reference (while creating a VCF file in process) and then userandomreads.sh
from BBMap suite to create fastq reads from that mutated genome.Thanks for your answer GenoMax . What is the more reasonable approach if one needs to start at a vcf and wants to create reads that can be used in a calling algorithm to generate a vcf file comparable to the origin.
There's another tool, applyvariants.sh, that accepts a reference and vcf and will output the altered reference. Then you can simulate reads from it using randomreads.sh.