Did someone already create a python library for parsing bcftools stats output files?
BCFTools stats generates a txt stats file from VCF/BCF files. These text stats files contains some tab delimited tables and key value pairs that I would like to acces programmatically in python.
Depending on the bcftoools stats arguments used the content of the stats file can be different: http://www.htslib.org/doc/bcftools.html#stats
It would be nice if this already exists and don't need to write a bcftools stats parser myself.
https://pyvcf.readthedocs.io/en/latest/
pyvcf is parsing VCF files, I am looking for a parser for VCF/BCF stats files generated by bcftools stats.
I'm attempting to do the same thing. Did you find a solution?
At least for tab delimited tables this is a more general issue that goes beyond this particular tool. The Pandas package is one of the main tools in the structured data space of the Python ecosystem.
Indeed, it looks like Illumina's Haplotype VCF comparison tools hap.py tools has a function
parseStats
inbcftools.py
that is just collecting lines starting withSN
and converting the data into a Pandas DataFrame for further analysis with Pandas:It looks pretty portable/adpatable. Is that all you need?
The original post mentioned some "key value pairs", but I don't see that handled.