I have some pooled GBS data (96 samples( that was generated by two runs on an Illumina HiSeq4000. The DNA was sise selected, between 100 and 200bp. The output is two files (run 1 and run 2) as fastq.gz files. For each sample I have an unique barcode -- said barcodes are in a list in a CSV file.
In addition I have a reference genome (as a single fasta file) for the organism in question.
What I want is the genotype (as SNPs) for each sample, and thus I need to demultiplex my data, ideally throw out fragments that have too low quality, align fragments and pileup, call SNPs etc.
Writing in January 2018, is there a preferred pipeline for this? I think I could do everything using Stacks, but other approaches might be available and offer greater speed/whatever other benefit? everything being similar a faster approach would be preferred.
I used PyRAD before, and it was slow as a rock. I would have hoped that, natural selection on the proliferation of alternative methods could have produced a sensible standard, sensible in terms of use, results and speed.