bcftools multiallelic split not working
1
2
Entering edit mode
3.2 years ago

I am attempting to split multiallelic sites using bcftools norm with the following command:

zcat ${inputVcf} | \
sed 's/AD,Number=./AD,Number=R/g' | \
sed 's/ADR,Number=./ADR,Number=R/g' | \
sed 's/ADF,Number=./ADF,Number=R/g' | \
bcftools norm \
  --fasta-ref ${genomeFa} \
  --check-ref s \
  --multiallelics -any \
  --output ${outputVcf}

The sed commands were based on the recommendation from here. However I'm still getting FORMAT entries such as the following: GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/0:44:44:56:1,10,5:1,4,2:0,6,3:PASS:511,99,48 ./.:.:.:.:.:.:.:.:. 0/1:53:53:63:0,12,6:0,4,1:0,8,5:PASS:483,210,164 which are clearly multiallelic. Anybody know how to fix this?

bcftools vcf • 5.7k views
ADD COMMENT
7
Entering edit mode
3.2 years ago

Hi, I think that you misinterpret what is a 'multi-allelic' call. The entry that you posted is not multi-allelic in this sense. A multi-allelic call may look like:

A      G,T    1/2

Thus, the genotype is GT. After splitting, this would become:

A      G      0/1
A      T      0/1

Kevin

ADD COMMENT
1
Entering edit mode

i think that clarifies things and pointed me in the right direction. what happened was, the vcf file was normalized in a previous step so the ALT column was split, but fields like AD remained as they were because those fields were was ignored, and their data types were still wrong. fixing the upstream implementation of bcftools norm worked for me and now both my ALT and AD fields are split as i expect them.

ADD REPLY
0
Entering edit mode

How can i achieve that you discribed above for a VCF file ?

ADD REPLY
0
Entering edit mode
bcftools norm -m-any

If you want to additionally left-align indels, then supply a FASTA reference:

bcftools norm -m-any --check-ref w -f human_g1k_v37.fasta

Take a look at my Step 4, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

ADD REPLY
0
Entering edit mode

I had a vcf file only contains snp variants (bi and multi) after GATK VQSR , now I want to split multiallelic variant into biallelic variant, the order I used is : bcftools norm -m -snps snp.2.vcf.gz -Ov -o output then it throw an error: Error: wrong number of fields in INFO/MLEAC at 2:10443, expected 2, found 1 how can i solve it?

ADD REPLY
3
Entering edit mode

first perform bcftools norm -m-any then VQSR

ADD REPLY
0
Entering edit mode

an off topic question: is there a mention of bcftool norm in any publication?

ADD REPLY
2
Entering edit mode

You will not be able to find bcftools norm in any publication. But you will be able to find bcftools in publications

ADD REPLY

Login before adding your answer.

Traffic: 1882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6