For example if I input a reference genome FASTA can I get simulated FASTQ files for ONT sequencing or PacBio sequencing runs that could have produced that data?
I'm trying to migrate over from Snakemake to Nextflow but from what I understand there is no option to perform dry runs in Nextflow so having a small dataset becomes a necessity rather than a recommendation. I'm wondering if there are any tools to help generate such data.
I usually just subset an existing large dataset to a few reads, peaks, genes, whatever is needed. You would need to add some details what exactly you want to simulate. I hear generally people use https://github.com/bcgsc/NanoSim for ONT data.
If the purpose is just to see if the pipeline runs from start to finish, why don't you just downsample the real dataset? By the way, the dry-run option of snakemake is one of the features I like the most since even creating and running toy datasets maybe a lot of work if you just want to check that the input/output dependencies are correct.