Can not get bcftools norm to join biallelics into a multiallelic.
0
0
Entering edit mode
20 months ago
a615ebfb ▴ 40

is this the right way to use bcftools to join/merge biallelic records into a multiallelic? If so, it is not working. No errors but it gives me the same file with my command added to the headers.

# Example with multiallelic:

cat test20.vcf 
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE1
chr7    117559591   Indel   TCTT    T,TTCT  .   .   .   GT  0/0

# Splitting works OK

###fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
##bcftools_normVersion=1.17+htslib-1.17
##bcftools_normCommand=norm -m-any -f ../hg38/hg38.fa test20.vcf; Date=Sat Mar 11 15:36:30 2023
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE1
chr7    117559590   Indel   ATCT    A   .   .   .   GT  0/0
chr7    117559592   Indel   CT  TC  .   .   .   GT  0/0
Lines   total/split/realigned/skipped:  1/1/2/0

# Splitting and joining but joining does not work.  Why?

bcftools norm -m-any   -f ../hg38/hg38.fa  test20.vcf |  bcftools norm -m+any   -f ../hg38/hg38.fa 
Lines   total/split/realigned/skipped:  1/1/2/0
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
##bcftools_normVersion=1.17+htslib-1.17
##bcftools_normCommand=norm -m-any -f ../hg38/hg38.fa test20.vcf; Date=Sat Mar 11 15:37:29 2023
##bcftools_normCommand=norm -m+any -f ../hg38/hg38.fa; Date=Sat Mar 11 15:37:29 2023
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE1
chr7    117559590   Indel   ATCT    A   .   .   .   GT  0/0
chr7    117559592   Indel   CT  TC  .   .   .   GT  0/0
Lines   total/split/realigned/skipped:  2/0/0/0

The above command correctly splits my multiallelic record into 2 biallelic recordfs and then it should join them into a single multiallelic record but it does not. Am I doing something wrong or is there a bug?

Thanks.

bcftools multiallelic • 1.4k views
ADD COMMENT
0
Entering edit mode

It's hard to troubleshoot for you without one clear unedited copy of the input VCF specified within a single code block. From your post, I couldn't tell what the input file comprises. However, I made a toy vcf like so based on your first example:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20230131
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
##INFO=<ID=TYPE,Number=.,Type=String,Description="Variant type">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  test
chr7    117559591   Indel   TCTT    T,TTCT  .   .   .   GT  0/0

I can split it like so with bcftools norm -m-any test.vcf:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20230131
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
##INFO=<ID=TYPE,Number=.,Type=String,Description="Variant type">
##bcftools_normVersion=1.15.1+htslib-1.15.1
##bcftools_normCommand=norm -m-any test.vcf; Date=Mon Mar 13 11:45:40 2023
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  test
chr7    117559591   Indel   TCTT    T   .   .   .   GT  0/0
chr7    117559591   Indel   TCTT    TTCT    .   .   .   GT  0/0

And I can join the output back again with bcftools norm -m+any:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20230131
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
##INFO=<ID=TYPE,Number=.,Type=String,Description="Variant type">
##bcftools_normVersion=1.15.1+htslib-1.15.1
##bcftools_normCommand=norm -m-any test.vcf; Date=Mon Mar 13 11:46:18 2023
##bcftools_normCommand=norm -m+any; Date=Mon Mar 13 11:46:18 2023
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  test
chr7    117559591   Indel   TCTT    T,TTCT  .   .   .   GT  0/0

Version details:

bcftools 1.15.1 Using htslib 1.15.1

ADD REPLY
1
Entering edit mode

Thanks for the response and troubleshooting, cfos4698. In my initial split, I used -f hg38.fa so there is normalization going on. It is not the same as what you did.

ADD REPLY

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6