Entering edit mode
6.5 years ago
bha
▴
80
I used 1000 g data sets to simulate the sequence. After performing simulation, I got 10K individuals, for some other analyses. I wonder, how i can check the relatedness of simulated individuals with original data (individuals)? Should I first calculate simple correlation of MAF from two data sets or any other method? My both data sets are in PLINK format. Any suggestion please?
Merge both data. Throw away SNPs with MAF<=5%,
remove SNPs in LD (pairwise ). Use KING for kinship. One should have ~40-60-90K SNPs usually to find kinship.Edit: 06/21/2018
~20K SNPs are good too to infer kinship. http://people.virginia.edu/~wc9c/KING/Download.htm
Also, it's suggested not to prune good SNPs at http://people.virginia.edu/~wc9c/KING/manual.html