Combining vcf files so that same loci data is combined
3
0
Entering edit mode
7.0 years ago
spiral01 ▴ 110

I have vcf files that I wish to combine so that any variants with matching positions are combined also. For example, if I have a variant in one file at position 123 and also one in another file there, I want that information to be combined in the genotype info.

The actual variants will be the same (T->G in one file will always be T->G in the other) as they have been created using the same reference data.

Is this possible to do in one go with any tool?

SNP • 2.5k views
ADD COMMENT
3
Entering edit mode
7.0 years ago

Is each file a different sample? If so, it sounds like GATK's CombineVariants tool would fit your purpose.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. Yes each file is a single individual. I am trying to combine the vindija and altai neanderthal vcf data. Both have been created using the hg19 as reference and I just want to combine the two vcf files in one.

ADD REPLY
0
Entering edit mode

I think that should do the trick for you then. Let me know if you have any issues.

ADD REPLY
0
Entering edit mode

GATK asks for a reference genome in fasta format. In this case I need the hg19 reference (obtained here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/). Is the chromFa.tar.gz the correct reference file for the hg19 build?

ADD REPLY
1
Entering edit mode

You'll want to download the hg19.2bit file and then use their utility to convert it to a Fasta file.

ADD REPLY
0
Entering edit mode
7.0 years ago
spiral01 ▴ 110

Whilst Jared's answer above worked perfectly, I also had success using bcftools merge with the --force-samples argument.

ADD COMMENT
0
Entering edit mode

Go ahead and accept (green check mark) your and @Jared's answer to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6