intersect variant from vcf files (2500 individual)
1
0
Entering edit mode
2.4 years ago

Hi!

I'm looking for variant that 5% of all vcf files share The total number of vcf files is 2500 (== 2500 individual WGS data)

so I tried bedtools and bcftools. they have function "intersection" I think bcftools intersection (isec) is best for me.

so I run bcftools

bcftools isec -n-24 -c all -p path/to/dir ${VCF_LIST}

-n-24 : "variants present in 24 or less files among the 2500 files"

but error was happen. "Could not load the index"

I think this error was caused by the open file limitation by linux (ulimit -u : 4096[max user processes], ulimit -n 385778[open files]).

How can I fix it ?

Somebody help me please

Thank you all

bcftools bedtools • 903 views
ADD COMMENT
0
Entering edit mode

Could not load the index"

all vcf files must be indexed with bcftools prior to use isec

cat ${VCF_LIST} | while read F; do bcftools index -f "${F}" ; done

but later.. you might still have some problems with opening 2500 files....

ADD REPLY
0
Entering edit mode

unless the error was "bcftools cannot load the index because too many files where open"

ADD REPLY
2
Entering edit mode
2.4 years ago

merge all the vcf per batch of sqrt(2500)=50

How to merge 20K single-sample VCFs *without* using plink or plink2?

and then , in the final vcf search for a variant containing 24 genotypes with ALT (eg. bcftools view -i 'AC==24')

ADD COMMENT
0
Entering edit mode

Thank you, @Pierre Lindenbaum I'll check as soon as possible!

ADD REPLY

Login before adding your answer.

Traffic: 1752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6