Hi guys,
I am a newby in population genetics. I am using next-gen sequencing technologies to re-sequence the genome of approximately 100 samples. My data set will have approximately 200,000 SNP loci and I want to calculate some statistics such as pairwise Fst, Hardy-Weinberg Linkage Disequilibrium and some neutrality tests.
I have been learning how to use the software, Arlequin using a dummy data set of a reduced size and I find that the software is very straightforward to use. However, I've realised that it is limited in the amount of characters it can take in the input file (limit: 250,000 characters) and my data set will exceed that limit.
So the question is if anyone can recommend an alternative way/software I can use so that I can calculate these statistics.
Thanks, 714
I didn't realise PLINK calculates F-statistics? I find PLINK quite hard to use especially when converting my vcf file to a PLINK-compatible format...I guess I will have to learn. I may try trimming my dataset on MAF as you have suggested. Thank you
vcftools has a converter to plink format.
I forgot to mention that VCFTOOLs can calculate Fst