Hi all,
We are genotyping ~2000 SNP in a Panel using NGS method. We want to build a metrics to calculate the relationship for one sample against all in our database, in order to avoid taking wrong samples that are already existed in our lab.
I used to use PLINK to calculate IBD score for GWAS data, however, after select those SNPs with MAF>0.3, only 500 SNPs left.
And as the increasement of samples, we find it hard, since it will cost much time to go through all the samples.
Is there an convenient way to calculate the relationship for my case? For example, same sample, identical/non-identical twins...
Thanks,
Junfeng
Is the problem that you do not know how to convert the NGS variants into PLINK (for IBD)?; or is the problem a computational problem of having too much data?
I can definitely use vcftools or directly use PLINK to convert vcf to ped, and then use PLINK to calculate IBD. However, genotypes of the samples were saved into database. If I am calculating the pairwise IBD for new 1000 samples against the database. I need to extract the genotypes from database, and convert them to ped, and then merge the new 1000 samples with these data.
This is tough when since database will be updated if the new 1000 samples added after check. And plink will calculate all the samples in the ped file, since I only need to do 1000*count(database)