vcf-concat error "column names do not match"
1
0
Entering edit mode
10.1 years ago
Scott ▴ 110

I am new to VCF tools and having trouble combining VCF files from different sub-populations.

I know it is possible to download such data already combined from 1000 Genomes' data slicer tool, but it is not able to handle a larger number of populations in one file as I sometimes require.

I am using the vcftools vcf-concat function to achieve this, but I am getting the error message below.

I am running OSX and using VCF tools through terminal.

Code:

./vcf-concat CEU_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz > test_out.vcf
The column names do not match; the column "NA06984" no present in [FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz].
 at ./vcf-concat line 32, <__ANONIO__> line 251.
    main::error('The column names do not match; the column "NA06984" no presen...') called at ./vcf-concat line 170
    main::concat('HASH(0x7fd6da0050c8)') called at ./vcf-concat line 12

Both of the files have one site and 99 individuals.

Thank you!

vcf-concat vcftools SNP • 3.6k views
ADD COMMENT
2
Entering edit mode
10.1 years ago

You want to merge, not concatenate, them. So use vcf-merge instead.

ADD COMMENT
0
Entering edit mode

Are you sure? Each file has genotypes for the same single marker. I want a "super-population" of the two populations combined, but still only the single marker.

ADD REPLY
0
Entering edit mode

Yes, the last thing you would ever want to do would be to concatenate datasets like that...it'd produce completely useless results. BTW, I suspect part of your confusion arises from misunderstanding the word "concatenate". If you had two files like:

file1

position1    pop1_sample1 pop1_sample2 pop1_sample3

and file2:

position1    pop2_sample1 pop2_sample2 pop2_sample3

and concatenated them then You'd duplicate each shared position:

position1    pop1_sample1 pop1_sample2 pop1_sample3
position1    pop2_sample1 pop2_sample2 pop2_sample3

The resulting file isn't a valid VCF. What you want is to add the individual sample calls as new columns, which is what merging does.

ADD REPLY
0
Entering edit mode

Hi Devon. Thanks for the explanation. I interpreted their merge and concatenate as the complete opposites. Thanks for the clarification. This might have been because I have been using the .ped format a lot, which has individual IDs in the first column and markers/ positions as adjacent columns. Thanks again!

ADD REPLY
0
Entering edit mode

Ah, that'd certainly cause the confusion!

ADD REPLY

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6