Question

vcf-concat error "column names do not match"

0

Entering edit mode

10.1 years ago

Scott ▴ 110

I am new to VCF tools and having trouble combining VCF files from different sub-populations.

I know it is possible to download such data already combined from 1000 Genomes' data slicer tool, but it is not able to handle a larger number of populations in one file as I sometimes require.

I am using the vcftools vcf-concat function to achieve this, but I am getting the error message below.

I am running OSX and using VCF tools through terminal.

Code:

./vcf-concat CEU_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz > test_out.vcf
The column names do not match; the column "NA06984" no present in [FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz].
 at ./vcf-concat line 32, <__ANONIO__> line 251.
    main::error('The column names do not match; the column "NA06984" no presen...') called at ./vcf-concat line 170
    main::concat('HASH(0x7fd6da0050c8)') called at ./vcf-concat line 12

Both of the files have one site and 99 individuals.

Thank you!

vcf-concat vcftools SNP • 3.6k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Scott ▴ 110

Ram · Accepted Answer · 2014-10-07

2

Entering edit mode

10.1 years ago

Devon Ryan 104k

You want to merge, not concatenate, them. So use vcf-merge instead.

ADD COMMENT • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Are you sure? Each file has genotypes for the same single marker. I want a "super-population" of the two populations combined, but still only the single marker.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Scott ▴ 110

0

Entering edit mode

Yes, the last thing you would ever want to do would be to concatenate datasets like that...it'd produce completely useless results. BTW, I suspect part of your confusion arises from misunderstanding the word "concatenate". If you had two files like:

file1

position1    pop1_sample1 pop1_sample2 pop1_sample3

and file2:

position1    pop2_sample1 pop2_sample2 pop2_sample3

and concatenated them then You'd duplicate each shared position:

position1    pop1_sample1 pop1_sample2 pop1_sample3
position1    pop2_sample1 pop2_sample2 pop2_sample3

The resulting file isn't a valid VCF. What you want is to add the individual sample calls as new columns, which is what merging does.

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon. Thanks for the explanation. I interpreted their merge and concatenate as the complete opposites. Thanks for the clarification. This might have been because I have been using the .ped format a lot, which has individual IDs in the first column and markers/ positions as adjacent columns. Thanks again!

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Scott ▴ 110

0

Entering edit mode

Ah, that'd certainly cause the confusion!

ADD REPLY • link 10.1 years ago by Devon Ryan 104k