Question

How to combine variants??

0

Entering edit mode

8.9 years ago

SOHAIL ▴ 410

Hi,

I have two bi-allelic variant files, My objective is to combine all the genotypes/samples for those sites that are common in both files?

Can someone please mention any tool and steps how to do that??

Thanks

ngs variant manioulation • 2.8k views

ADD COMMENT • link updated 8.9 years ago by WouterDeCoster 48k • written 8.9 years ago by SOHAIL ▴ 410

0

Entering edit mode

It's unclear to me what is common between both files. Are the same variants in both files or the same samples?

ADD REPLY • link 8.9 years ago by WouterDeCoster 48k

0

Entering edit mode

Hi WouterDeCoster, Given: Two different files 1. 1000G Bi-allelic SNPs 2. My sample Bi-allelic SNPs

Problem:

    1. Collect only those variants that intersect between those samples ( i mean output those sites that common in both),
        Result: two files  (1.) Intersect variants of 1000G with 1000G Genotype information (2.) My samples with same variants of my own sample genotypes.

  2. Combine those same variants into single VCF file, with same sites and union of all samples.

In short, common variants in start and then the union of all samples. Thanks!

ADD REPLY • link 8.9 years ago by SOHAIL ▴ 410

score 0 · Answer 1 · 2016-11-05

I would solve problem one by generating identifiers for your variants (preferably in the smallest file) by concatenating chromosome, position and alternative allele. You can use those identifiers to filter the second file.

e.g.:

#Get the identifiers present in yourfile.vcf
bcftools annotate --set-id '%CHROM\_%POS\_%ALT' yourfile.vcf | cut -f3 > MyIdentifiers.txt

#Give the same type of identifiers to the 1000G data vcf
bcftools annotate --set-id '%CHROM\_%POS\_%ALT' 1000Gdata.vcf > 1000Gdata_withidentifiers.vcf

#Filter the 1000G data to only contain the variants you have in your vcf
java -jar GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant 1000Gdata_withidentifiers.vcf -o 1000G_myvariants.vcf -IDs MyIdentifiers.txt

Problem two can probably easily be solved by using something like vcf-merge from vcftools