Genotype concordance for vcf.gz
1
0
Entering edit mode
4.7 years ago
catarinaglmg ▴ 10

Hi! Does anyone know any tool that allows to compare genotypes between two compressed vcf files (.vcf.gz)? (Or any ideas how can I do it without decompressing?)

Thanks!

vcf genotypes • 2.1k views
ADD COMMENT
0
Entering edit mode
4.7 years ago

How do you want to compare them? - you can potentially just do it via BCFtools and / or indexed awk arrays, but please tell us what you want to do.

ADD COMMENT
0
Entering edit mode

I want to confirm if the genotypes are the same in my original vcf file versus the same annotated file. I'm also not entirely sure if they both are sorted (which can be a potential comparison problem for some tools). I just want to check that I didn't lose information in the process. (In a multisample file)

Thanks!

ADD REPLY
1
Entering edit mode

I see... I would do the following:

  1. Sort the VCFs via bcftools sort
  2. Split multi-alleles, left-align indels, and check variants against reference genome ( see here: A: Merging vcf files (intersection and union) )
  3. Set the ID field in each to a unique identifier (e.g. CHR:POS:VAR:REF via bcftools annotate -x ID -I +'%CHROM:%POS:%REF:%ALT')
  4. Compare the IDs in each file. I would use AWK indexed arrays.
ADD REPLY
1
Entering edit mode

Thanks Kevin! I solved it with bcftools sort - there's no need to uncompress to sort files - and then bcftools stats -v to compare between files

ADD REPLY
0
Entering edit mode

Great - thanks! I was not aware of BCFtools stats.

ADD REPLY

Login before adding your answer.

Traffic: 1574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6