I have illumina paired-end reads mapped to reference genome using bowtie and created mpileup using samtools and from mpileup identified SNPs using variant caller (varscan). I got the output in VCF format. I need to do SNP cluster analysis. Are there any software to do SNP cluster analysis or any R packages available?
As the rest of answers I do not understand what do you mean by cluster analysis using SNPs. However, you can see the distribution and the frequency of SNPs over a certain window size across the genome. Doing that you can see if there are a cluster of SNPs in a certain region such as chromosome, to do that you can use a tool called CIRCOS LINK (it has a tutorial . Another clustering method of SNPs is by categorizing their predicted effect on a gene such as synonymous or non-synonymous, and stop coding variants using Ensembl tool called Variant Effect Predictor LINK . These the two clustering things I can think of. Another one in which you have to sequence more than one genome of the same species and you want to see if they are closely related or not.
Let me know which one you mean if they are not of the two examples in which I can be more helpful
If you are asking about burden testing, AssoTesteR
You simply put your phenotype as 1 / 0 for case / control and a genotype matrix with 1 for the alternate and 0 for the reference in columns of snps and rows of samples.
Once your data is in that format, you can perform a variety of multi-locus tests including, for example, c-alpha
What do you mean by clustering? There are many different types of analyses that uses clustering for NGS data...
Could even say, WHY do you need to do cluster analysis? Just because somebody told you?
just for my curosity..I want to try clustering analysis..any info or url to get started are welcome