Hi all,
I am currently attempting to create a pipeline for Illumina NGS sequence alignment, SNP/Indel-calling and association testing upon multiple samples.
Can anyone recommend a data set for testing a pipeline like this, right through to the stage of calling associated mutations?
Alternatively, is there an existing means of generating such a data set?
Any tips would be greatly appreciated.
Thanks in advance!
A good suggestion. At it's very simplest I just want to run through a workflow with a few samples (say 2 phenotypes with 5 members per group and known causal mutations present). I know this isn't scientifically correct - it's simply to get a feel for the workflow and filtering of unassociated mutations etc.