Hi all,
Can anybody describe a high-level pipeline to analyze SNP arrays (either illumina or affy platform) starting from raw data hot off the array machine (intensity file?). It would be great to point at some commonly used softwares along the way at each step, or tutorials with sample files. More specifically, I am piecing together my understanding of the steps (below), and any feedbacks/corrections/additions/tutorials will be greatly appreciated. Thank you.
1) raw file from SNP array (tiff files)
QC and quantification through platform-dependent softwares to quantify (genomeStudio? R module readIllumina?)
2) csv files with SNP probe as each row, and quantified intensity (top/bottom strand, A and B alleles?)
SNP calling by using genomeStudio? R module readIllumina? again
3) csv files with SNP calling, and all the QC-related statistics (positions, GT scores, Cluster sep, theta, R, etc, what are these anyways?)
Imputation to reference genomes by Mach or Beagles?
4) csv files or plink files to associate each high-confident SNP to dbSNP and its corresponding final call (AA, AB, or BB)
GWAS analysis using plink
5) GWAS analysis results (end point)