Entering edit mode
10.1 years ago
Scott
▴
110
I am new to VCF tools and having trouble combining VCF files from different sub-populations.
I know it is possible to download such data already combined from 1000 Genomes' data slicer tool, but it is not able to handle a larger number of populations in one file as I sometimes require.
I am using the vcftools vcf-concat function to achieve this, but I am getting the error message below.
I am running OSX and using VCF tools through terminal.
Code:
./vcf-concat CEU_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz > test_out.vcf
The column names do not match; the column "NA06984" no present in [FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz].
at ./vcf-concat line 32, <__ANONIO__> line 251.
main::error('The column names do not match; the column "NA06984" no presen...') called at ./vcf-concat line 170
main::concat('HASH(0x7fd6da0050c8)') called at ./vcf-concat line 12
Both of the files have one site and 99 individuals.
Thank you!
Are you sure? Each file has genotypes for the same single marker. I want a "super-population" of the two populations combined, but still only the single marker.
Yes, the last thing you would ever want to do would be to concatenate datasets like that...it'd produce completely useless results. BTW, I suspect part of your confusion arises from misunderstanding the word "concatenate". If you had two files like:
file1
and file2:
and concatenated them then You'd duplicate each shared position:
The resulting file isn't a valid VCF. What you want is to add the individual sample calls as new columns, which is what merging does.
Hi Devon. Thanks for the explanation. I interpreted their merge and concatenate as the complete opposites. Thanks for the clarification. This might have been because I have been using the .ped format a lot, which has individual IDs in the first column and markers/ positions as adjacent columns. Thanks again!
Ah, that'd certainly cause the confusion!