Hi, I attempted to find overlapping variants across my four panels using
bcftools isec -p dir -n=4 <file1.vcf.gz> <file2.vcf.gz> <file3.vcf.gz> <file4.vcf.gz>
I got this list of outputs:
0000.vcf
0001.vcf
0002.vcf
0003.vcf
README.txt
sites.txt
Where the README.txt says:
dir/0000.vcf for stripped <file1.vcf.gz>
dir/0001.vcf for stripped <file2.vcf.gz>
dir/0002.vcf for stripped <file3.vcf.gz>
dir/0003.vcf for stripped <file4.vcf.gz>
My questions are— 1) what does “for stripped” mean? 2) Is sites.txt the list of overlapping variants across the four panels? I’ve searched for documentation that explains the outputs but haven’t been successful. Thanks for your help!
Got it. Thanks for your help!
Then, after trimming, whether all the 4 vcf output files will be the same?. What will be the difference between the different stripped vcf files?. Please suggest.
Can't say off the top of my head, but my best guess is no, all 4 won't be the same - their loci (CHR, POS, REF, ALT) should match (if the intersection was done with all 4 fields) but other information won't match as that will be derived directly from the input VCF data.