I am trying to merge vcf files across chromosomes 1-22. I am using bcftools v1.9 in order to do this. The code I am using is bcftools merge 'myfile1.vcf.gz' 'myfile2.vcf.gz'
etc....'myfile22.vcf.gz' -o myfile1_22.vcf.gz
However I get the following error: "Error: Duplicate sample names (1310229_1310229), use --force-samples to proceed anyway."
I'm afraid to use --force-samples
because I don't understand how this will affect the merged vcf file and how many duplicates there are. The data is from the UK Biobank and the VCF files are massive in size (total across chromosomes =1.3TB).
Any suggestions to actually solve the error rather than use --force-samples
?
NOTE: I am VERY VERY new to biostatistical analysis. I appreciate your advice heavily. I would appreciate it more if your advice was structured for a beginner.
I checked the headers and found out the first sample is 1310229. I think if I use force samples, it will prepend every single sample but I don’t know why. Any ideas?