Question

bcftools consensus error

1

Entering edit mode

7.3 years ago

Petr.Prochazka ▴ 10

Hello, I have the following trouble with bcftools.

My goal is to obtain a real-data text file in IUPAC alphabet for my text search algorithm. There are no such files on the Internet (at least I cannot find them), there exist only a few artificial text files made over the IUPAC alphabet. My idea was to create the real-data IUPAC file from VCF file and the reference fasta sequnce using bcftools consensus program. I downloaded the necessary data from 1000genome Project. However, the bcftolls consensus program reports the following error:

Symbolic alleles other than <del> are currently not supported: <cn0> at 9:85501

My question is/are:

Is there any way how to filter the VCF file and get rid of the unsupported alleles?
Is there any publicly available VCF files not containing these (by bcftools cosensus) unsupported alleles?
Is there any publicly available real-data stored as a text file over the IUPAC alphabet?
Is there any tool that can transform VCF files to some primitive text form (e. g. "AGTT{AT, CC, C}ACCT" representing 3 variants: "AGTTATACCT", "AGTTCCACCT" and "AGTTCACCT"?

Thank you very much for any idea that can move me one step further.

Petr.

genome • 3.8k views

ADD COMMENT • link updated 7.3 years ago by Kevin Blighe 88k • written 7.3 years ago by Petr.Prochazka ▴ 10

score 4 · Answer 1 · 2017-08-31

I would run the following on your data (before running bcftools consensus) in order to ensure that your VCF/BCF is in good shape:

bcftools norm -m-any VCF.GZ | bcftools norm -Ov --check-ref w -f REF.FASTA > OUT.VCF

1st pipe, splits multi-allelic calls into separate variant calls
2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome

You can also set the ID field in the VCF with a 3rd pipe, if you wish:

bcftools annotate -Ov -I +'%ID' #leaves it as the existing ID

or

bcftools annotate -Ob -x ID -I +'%CHROM:%POS:%REF:%ALT' #sets it to chr:pos:ref:alt

Hope that this helps.

Kevin