What is the difference between norm --multiallelics -any versus --atomize?
1
1
Entering edit mode
14 months ago
a615ebfb ▴ 40

Hello, forgive my ignorance-

Suppose input.vcf contains a complex multiallelic site.

What is the difference between

bcftools norm --multiallelics -any -f hg38.fa input.vcf

versus

bcftools norm --atomize -f hg38.fa input.vcf

I understand what --multiallelics -any does but not sure what is going on with --atomize. In the documentation it says "Decompose complex variants, e.g. split MNVs into consecutive SNVs.". I do not understand what this means for a multiallelic site.

If someone has a good example that would help clarify, that would be great.

Thanks in advance.

bcftools • 926 views
ADD COMMENT
2
Entering edit mode
14 months ago
Ram 44k

I don't think atomization compares to norm with respect to multiallelic sites. You can see an example of atomatization on a multi-allelic site in the example under --atom-overlaps option documentation:

# Before atomization:
    100  CC  C,GG   1/2

    # After:
    #   bcftools norm -a .
    100  C   G      ./1
    100  CC  C      1/.
    101  C   G      ./1

Normalization would just give you 2 records (I can't tell offhand what the GT field would be):

100 CC GG
100 CC C

Only the ALT field is split and the REF/POS are altered only in certain cases. MNVs are not split into SNVs - CC>GG remains CC>GG. I think when atomize is used MNVs will be split, so you get 2 C>G entries instead of one CC>GG entry. Note that this split would happen even if that record were not multiallelic.

Side note: I wonder if they meant bcftools norm -a --atom-overlaps . and not bcftools norm -a ., but that's not today's problem.

ADD COMMENT

Login before adding your answer.

Traffic: 1936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6