Hi all,
I have thousands of vcf which I want to merge into just one vcf. For that purpose, I am using vcftools.
bcftools merge --force -l list_files_to_merge -Oz -o ./Merged.vcf.gz
--force is used because in some vcf files some samples may be repeated. I do not care about that.
The command works but it stops when one reference does not match the reference with other samples. That will probably mean that this vcf used another reference genome that is ok.
The REF prefixes differ: T vs A (1,1) Failed to merge alleles at X:15517198 in file_5000.vcf.gz
I would like to have the possibility of omitting this vcf (file_5000.vcf.gz in the example) and continue with the merge command. Maybe in the log file it may be written which samples were omitted due to this error.
Is there a way for doing this?
why not removing file_5000.vcf.gz from list_files_to_merge ?
Thanks for your answer
Yes, that is what I did in fact.
The problem is that I have to repeat the whole process again (takes time) and this is happening again with more vcfs. I do not know how many vcf files have this issue.
I may calculate the number of times this will happen. I know that the merging process merges reading line by line the "list_files_to_merge" and it stopped with files in position 2342, 2549, 8876.... knowing that the list has 100k vcfs that means this to happen up to 40 times...
Therefore if there is a way for avoiding this stop it could be great.