Entering edit mode
6.9 years ago
gwotto
•
0
Hi, I am developing a pipeline for whole-genome sequencing analysis, including software to align sequences (e.g. bwa), quality control and variant calling, e.g. by GATK. In order to write tests and debug the pipeline, I need a small test set. Are there any best practices and instructions around about how such a test set should be generated? Or are there any publicly available test sets? Thanks a lot for your help!
Perhaps I should have been more specific. I am not looking for whole genome data in general, rather I would have something like a small section of the genome as fastq reads, a small section of the genome as reference with the appropriate indices, either simulated or from real data. The goal is to have a test set that runs in a couple of minutes rather than hours or days. I would like to know how people generate such a data set, or if there are some around that are used. So far I haven't come across any...
If your aim is to simulate data then you can check out answer by Vijay Lakhujani in this post