From pooled fastq data to SNPs
1
0
Entering edit mode
6.8 years ago
Fedster ▴ 30

I have some pooled GBS data (96 samples( that was generated by two runs on an Illumina HiSeq4000. The DNA was sise selected, between 100 and 200bp. The output is two files (run 1 and run 2) as fastq.gz files. For each sample I have an unique barcode -- said barcodes are in a list in a CSV file.

In addition I have a reference genome (as a single fasta file) for the organism in question.

What I want is the genotype (as SNPs) for each sample, and thus I need to demultiplex my data, ideally throw out fragments that have too low quality, align fragments and pileup, call SNPs etc.

Writing in January 2018, is there a preferred pipeline for this? I think I could do everything using Stacks, but other approaches might be available and offer greater speed/whatever other benefit? everything being similar a faster approach would be preferred.

fastq demultiplexing barcode GBS SNP • 1.8k views
ADD COMMENT
1
Entering edit mode
6.8 years ago
bari.ballew ▴ 470

You probably want to check out Broad's best practices (https://software.broadinstitute.org/gatk/best-practices/). Specifically, look mainly at the sections on data pre-processing and germline SNPS+indels. Alternatively, you could check out the bcbio pipeline (http://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html).

Note that you're basically asking how to analyze sequencing data from nuts to bolts, so depending on your background, this will likely take a good bit of effort, both in reading/understanding the pipelines and in implementing. Best of luck!

ADD COMMENT
0
Entering edit mode

I used PyRAD before, and it was slow as a rock. I would have hoped that, natural selection on the proliferation of alternative methods could have produced a sensible standard, sensible in terms of use, results and speed.

ADD REPLY

Login before adding your answer.

Traffic: 2661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6