Test set for WGS analysis pipeline
1
0
Entering edit mode
6.9 years ago
gwotto • 0

Hi, I am developing a pipeline for whole-genome sequencing analysis, including software to align sequences (e.g. bwa), quality control and variant calling, e.g. by GATK. In order to write tests and debug the pipeline, I need a small test set. Are there any best practices and instructions around about how such a test set should be generated? Or are there any publicly available test sets? Thanks a lot for your help!

genome sequencing alignment next-gen • 2.0k views
ADD COMMENT
0
Entering edit mode
6.9 years ago
MSM55 ▴ 160

There are lot of publicly available data-set on NCBI-SRA

ADD COMMENT
0
Entering edit mode

Perhaps I should have been more specific. I am not looking for whole genome data in general, rather I would have something like a small section of the genome as fastq reads, a small section of the genome as reference with the appropriate indices, either simulated or from real data. The goal is to have a test set that runs in a couple of minutes rather than hours or days. I would like to know how people generate such a data set, or if there are some around that are used. So far I haven't come across any...

ADD REPLY
1
Entering edit mode

If your aim is to simulate data then you can check out answer by Vijay Lakhujani in this post

ADD REPLY

Login before adding your answer.

Traffic: 2615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6