Entering edit mode
20 months ago
a615ebfb
▴
40
is this the right way to use bcftools to join/merge biallelic records into a multiallelic? If so, it is not working. No errors but it gives me the same file with my command added to the headers.
# Example with multiallelic:
cat test20.vcf
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1
chr7 117559591 Indel TCTT T,TTCT . . . GT 0/0
# Splitting works OK
###fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
##bcftools_normVersion=1.17+htslib-1.17
##bcftools_normCommand=norm -m-any -f ../hg38/hg38.fa test20.vcf; Date=Sat Mar 11 15:36:30 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1
chr7 117559590 Indel ATCT A . . . GT 0/0
chr7 117559592 Indel CT TC . . . GT 0/0
Lines total/split/realigned/skipped: 1/1/2/0
# Splitting and joining but joining does not work. Why?
bcftools norm -m-any -f ../hg38/hg38.fa test20.vcf | bcftools norm -m+any -f ../hg38/hg38.fa
Lines total/split/realigned/skipped: 1/1/2/0
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr7>
##bcftools_normVersion=1.17+htslib-1.17
##bcftools_normCommand=norm -m-any -f ../hg38/hg38.fa test20.vcf; Date=Sat Mar 11 15:37:29 2023
##bcftools_normCommand=norm -m+any -f ../hg38/hg38.fa; Date=Sat Mar 11 15:37:29 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1
chr7 117559590 Indel ATCT A . . . GT 0/0
chr7 117559592 Indel CT TC . . . GT 0/0
Lines total/split/realigned/skipped: 2/0/0/0
The above command correctly splits my multiallelic record into 2 biallelic recordfs and then it should join them into a single multiallelic record but it does not. Am I doing something wrong or is there a bug?
Thanks.
It's hard to troubleshoot for you without one clear unedited copy of the input VCF specified within a single code block. From your post, I couldn't tell what the input file comprises. However, I made a toy vcf like so based on your first example:
I can split it like so with
bcftools norm -m-any test.vcf
:And I can join the output back again with
bcftools norm -m+any
:Version details:
bcftools 1.15.1 Using htslib 1.15.1
Thanks for the response and troubleshooting, cfos4698. In my initial split, I used -f hg38.fa so there is normalization going on. It is not the same as what you did.