SNP data analysis from vcf file to statistical tests.
1
0
Entering edit mode
4.9 years ago
Mehmet ▴ 820

Dear all, I have a vcf file from GATK tool. The file has 48 population of a species with scaffolds (50 scaffolds) from different locations and host species. What I need to do is to perform statistical tests and some analysis between populations. As some tests ( Fst, Tajima’s D etc.) are mostly used for two populations, I need to make groups having populations. For example, Populations A,B,C belong to first location and populations D,E,F,G belong to second location. Populations A,B,C belong to a host, and other populations belong to another host etc.

I would like to get advices for grouping populations (7 populations) into 2. Is it okay to put data of populations (A,B,C) into one file and others (D,E,F,G) into another file and consider as two populations for statistical tests and further downstream analysis ( natural selection)? Any advice for approach and tools would be appreciated.

Thanks .

SNP next-gen genome • 1.1k views
ADD COMMENT
2
Entering edit mode
4.8 years ago

Hey, it is possible to perform statistical tests straight from the VCF using SnpSift CaseControl. However, I would encourage you to devote 1 day to converting your data to PLINK format, and performing your analyses there.

Caution: when converting from VCF to PLINK format, PLINK will re-order the samples in your VCF. Therefore, make use of the --indiv-sort flag in order to control the order. This is very important for the purpose of aligning the genotype with your phenotype (FAM) data.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6