Question

Test set for WGS analysis pipeline

0

Entering edit mode

6.9 years ago

gwotto • 0

Hi, I am developing a pipeline for whole-genome sequencing analysis, including software to align sequences (e.g. bwa), quality control and variant calling, e.g. by GATK. In order to write tests and debug the pipeline, I need a small test set. Are there any best practices and instructions around about how such a test set should be generated? Or are there any publicly available test sets? Thanks a lot for your help!

genome sequencing alignment next-gen • 2.0k views

ADD COMMENT • link updated 6.9 years ago by MSM55 ▴ 160 • written 6.9 years ago by gwotto • 0

score 0 · Answer 1 · 2017-12-14

0

Entering edit mode

6.9 years ago

MSM55 ▴ 160

There are lot of publicly available data-set on NCBI-SRA

ADD COMMENT • link 6.9 years ago by MSM55 ▴ 160

0

Entering edit mode

Perhaps I should have been more specific. I am not looking for whole genome data in general, rather I would have something like a small section of the genome as fastq reads, a small section of the genome as reference with the appropriate indices, either simulated or from real data. The goal is to have a test set that runs in a couple of minutes rather than hours or days. I would like to know how people generate such a data set, or if there are some around that are used. So far I haven't come across any...

ADD REPLY • link 6.9 years ago by gwotto • 0

1

Entering edit mode

If your aim is to simulate data then you can check out answer by Vijay Lakhujani in this post

ADD REPLY • link 6.9 years ago by Tm ★ 1.1k