I need to generate "dummy"/"fake", but "formally valid", VCF data to test the performance of processing pipelines.
The need for such data arises in many context, but at the moment I am most interested in measuring the performance of alternative approaches to merging large numbers (>10K) of single-sample VCFs.
Most of the VCF data that I have ready access to is protected patient data, which limits what I can do with it (e.g. which cloud servers I can upload it to for processing).
Can anyone recommend a method for generating dummy/fake single-sample VCFs?
1000 genomes?