Relatedness software for 20,000 exome sequencing datasets.
1
0
Entering edit mode
6.0 years ago
rjobmc • 0

Hi all,

I have a dataset of around 20,000 exomes and I need a quick and efficient tool to look for relatedness in all of my samples. I have tried "relatedness2" from vcftools but it is very slow. It's also "pair-wise" and so not very efficient.

We use we use multi-vcf.gz files. Does anyone know of a (possibly) C++ based software to do this? According to 23andme, only around 1000 SNV's are need to efficiently find related samples. We are looking for a tool to allow us to find any samples that are related up to second degree and also unknown duplicate samples in one run (if possible). I have done a pretty thorough search but not found anything.

Can anyone help?

Many thanks in advance

vcf relatedness • 1.4k views
ADD COMMENT
3
Entering edit mode
6.0 years ago
Shicheng Guo ★ 9.6k

Only select tag-SNPs and run the relatedness2, the less SNPs you use, the faster it is.

plink --vcf input.vcf --indep-pairwise 50 10 0.8 --recode vcf --out output the third parameter 0.8 is R2 threshold. you can try 0.6 0.5 or different values to control the number of SNPs. plink is a software you cannot ignored in population genetic analysis.

ADD COMMENT
0
Entering edit mode

Hi Shicheng Guo, Where can I get these tag-SNPs for both WES and WGS? I presume it would be based on LD scores < 0.2 or something like that? Thanks for your help.

ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6