Merge vcf with bcftools error wrong reference
1
0
Entering edit mode
4.1 years ago
chariko ▴ 60

Hi all,

I have thousands of vcf which I want to merge into just one vcf. For that purpose, I am using vcftools.

bcftools merge --force -l list_files_to_merge -Oz -o ./Merged.vcf.gz

--force is used because in some vcf files some samples may be repeated. I do not care about that.

The command works but it stops when one reference does not match the reference with other samples. That will probably mean that this vcf used another reference genome that is ok.

The REF prefixes differ: T vs A (1,1) Failed to merge alleles at X:15517198 in file_5000.vcf.gz

I would like to have the possibility of omitting this vcf (file_5000.vcf.gz in the example) and continue with the merge command. Maybe in the log file it may be written which samples were omitted due to this error.

Is there a way for doing this?

bcftools merge wrong reference • 2.9k views
ADD COMMENT
0
Entering edit mode

why not removing file_5000.vcf.gz from list_files_to_merge ?

ADD REPLY
0
Entering edit mode

Thanks for your answer

Yes, that is what I did in fact.

The problem is that I have to repeat the whole process again (takes time) and this is happening again with more vcfs. I do not know how many vcf files have this issue.

I may calculate the number of times this will happen. I know that the merging process merges reading line by line the "list_files_to_merge" and it stopped with files in position 2342, 2549, 8876.... knowing that the list has 100k vcfs that means this to happen up to 40 times...

Therefore if there is a way for avoiding this stop it could be great.

ADD REPLY
0
Entering edit mode
4.1 years ago

if your vcfs have a dictionary ( lines in the header starting with ##contig=) , you can use my tool http://lindenb.github.io/jvarkit/HowManyBamDict.html that will produce a tab delimted file with the path to the vcf and a checksum for the dictionary.

You'll get a signature for each distinct dictionary. Group the VCFs for each dictionary.

$ find /home/lindenb/src/jvarkit/src/test/resources -name "*.vcf.gz" -o -name "*.bam" -o -name "*.cram"

9a5c58c2c91e731135b27ed14974523a    .   85  3101976562  1=249250621;2=243199373;3=198022430;4=191154276;5=180915260;6=17111
9a5c58c2c91e731135b27ed14974523a    /home/lindenb/src/jvarkit/src/test/resources/ExAC.r1.sites.vep.vcf.gz
4677ece43eea2b029d0d33fe130ea6c7    .   86  3137454505  chr1=249250621;chr2=243199373;chr3=198022430;chr4=191154276;chr5=18
4677ece43eea2b029d0d33fe130ea6c7    /home/lindenb/src/jvarkit/src/test/resources/manta.B00I9CJ.vcf.gz
bd7e0928fc3c810e48fafc53a4222ed5    .   11  18490   RF01=3302;RF02=2687;RF03=2592;RF04=2362;RF05=1579;RF06=1356;RF07=1074;RF
(...)
ADD COMMENT

Login before adding your answer.

Traffic: 1389 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6