Python library for parsing bcftools stats file
1
2
Entering edit mode
7.6 years ago
William ★ 5.3k

Did someone already create a python library for parsing bcftools stats output files?

BCFTools stats generates a txt stats file from VCF/BCF files. These text stats files contains some tab delimited tables and key value pairs that I would like to acces programmatically in python.

Depending on the bcftoools stats arguments used the content of the stats file can be different: http://www.htslib.org/doc/bcftools.html#stats

It would be nice if this already exists and don't need to write a bcftools stats parser myself.

bcftools • 4.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

pyvcf is parsing VCF files, I am looking for a parser for VCF/BCF stats files generated by bcftools stats.

ADD REPLY
0
Entering edit mode

I'm attempting to do the same thing. Did you find a solution?

ADD REPLY
0
Entering edit mode

"These text stats files contains some tab delimited tables and key value pairs ..."

At least for tab delimited tables this is a more general issue that goes beyond this particular tool. The Pandas package is one of the main tools in the structured data space of the Python ecosystem.

Indeed, it looks like Illumina's Haplotype VCF comparison tools hap.py tools has a function parseStats in bcftools.py that is just collecting lines starting with SN and converting the data into a Pandas DataFrame for further analysis with Pandas:

def parseStats(output, colname="count"):
    """ Parse BCFTOOLS Stats Output """

    result = {}
    for x in output.split("\n"):
        if x.startswith("SN"):
            vx = x.split("\t")
            name = vx[2].replace("number of ", "").replace(":", "")
            count = int(vx[3])
            result[name] = count

    result = pandas.DataFrame(list(result.iteritems()), columns=["type", colname])
    return result

It looks pretty portable/adpatable. Is that all you need?
The original post mentioned some "key value pairs", but I don't see that handled.

ADD REPLY
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6