compare groups/populations - vcf files
2
0
Entering edit mode
6.2 years ago
ti&te ▴ 40

Hi, I would like to compare the vcf files of two groups (50 samples - vcf files per group) and generate a report of comparison between these groups. The result would be the SNV and the frequency of the individual variant in each group. any idea what might be the most suitable tool?

many thanks.

vcf next-gen SNP • 4.5k views
ADD COMMENT
0
Entering edit mode

many thanks for all the suggestions. :)

ADD REPLY
4
Entering edit mode
6.2 years ago

This sort of basic genotype query can be done efficiently with plink 1.9 or 2.0 (https://www.cog-genomics.org/plink2 ). "plink2 --vcf [VCF filename] --keep [list of sample IDs in one group] --freq" dumps all variant frequencies for one group, and --extract/--write-snplist can be used with the built-in variant filtering options to select the variants you're interested in.

As a general rule, when you only care about the genotype calls and not about other fields like genotype quality, read depth, etc., plink2 is very likely to be far more efficient than general-purpose VCF-handling tools (especially if you convert the VCF to plink2-format first).

ADD COMMENT
2
Entering edit mode
6.2 years ago
Vitis ★ 2.6k

If you're only interested in shared variant sites between the populations, bedtools intersect (https://bedtools.readthedocs.io/en/latest/) would help you find those shared ones and you could query their genotypes and frequencies in the populations with bcftools (https://samtools.github.io/bcftools/bcftools.html).

If you're also interested in population specific variants, I would suggest to do a joint variant call with all individuals from all populations. Individuals could be labeled to retain their memberships to populations. The VCF file from the joint call could be queried in any way you like. I'd recommend the BGT tool (https://github.com/lh3/bgt) for such tasks.

ADD COMMENT

Login before adding your answer.

Traffic: 2825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6