Hello Everyone,
I have a vcf file containing variants for a 7-8 k patients (exome sequencing) which is not tabix indexed and I am trying to retrieve the following information:
- information on columns header
- number of monomorphic variants, number of multi-allelic variants
- distinguish number of SNPS and indels
- Depth of sequencing , human genome build each patient is
- exact position of monomorphic and multi-allelic variant
- frequency distribution of variants
I am looking for all this information for all the chromosome together and on per chromosome basis. I am using VCFtools on linux environment. But being new to this field I have no clue on how I will be proceeding. All the help is appreciated, if detailed it will be better.
Regards
Nihar
Responding to these followup questions:
bcftools stats <bgzipped vcf file name>
. There are command line options that you can use to refine the stats but the base command outputs a bunch of info, if you look through the results you'll see the data you are looking for.export ENV_VARIABLE="Something"
so for instanceexport BCFTOOLS_PLUGIN="/path/to/bcftools/plugin/directory"
You can find detailed instructions for setting environment variables for your particular OS/distribution fairly easily with a few google searches if you need more information.