Hello, I have the following trouble with bcftools.
My goal is to obtain a real-data text file in IUPAC alphabet for my text search algorithm. There are no such files on the Internet (at least I cannot find them), there exist only a few artificial text files made over the IUPAC alphabet. My idea was to create the real-data IUPAC file from VCF file and the reference fasta sequnce using bcftools consensus program. I downloaded the necessary data from 1000genome Project. However, the bcftolls consensus program reports the following error:
Symbolic alleles other than <del> are currently not supported: <cn0> at 9:85501
My question is/are:
Is there any way how to filter the VCF file and get rid of the unsupported alleles?
Is there any publicly available VCF files not containing these (by bcftools cosensus) unsupported alleles?
Is there any publicly available real-data stored as a text file over the IUPAC alphabet?
Is there any tool that can transform VCF files to some primitive text form (e. g. "AGTT{AT, CC, C}ACCT" representing 3 variants: "AGTTATACCT", "AGTTCCACCT" and "AGTTCACCT"?
Thank you very much for any idea that can move me one step further.
Petr.
Hi, Kevin, Can you help me to check my case:
How to clean this strange VCF file?
Thanks.
Shicheng