Entering edit mode
4.1 years ago
yolek64754
▴
30
Hi there, I have been trying to compute "Pi" on my Ch22 using VCFtools; this is the pipeline I am using:
Data=$1
bcftools mpileup --ignore-RG -f /ref/human_g1k_v37.fa.gz $Data | bcftools call -m -Oz -f GQ -o $Data"_ND.vcf.gz"
zcat $Data"_ND.vcf.gz" | vcftools --vcf - --site-pi --out $Data"_nucleotide_diversity"
awk '$1== "22" { print $0 }' $Data"_nucleotide_diversity.sites.pi" > $Data"_nucleotide_diversity22.sites.pi"
Then the OUTPUT looks like this:
22 16060564 0
22 16060565 0
22 16060566 0
22 16060567 1
22 16060568 0
22 16060569 0
22 16060570 0
22 16060571 0
22 16060572 0
22 16060573 0
22 16060574 0
22 16060575 0
I was adding all the value of the 3rd column and then dividing it by the total number of sites on my chromosome, but after being the results, it does seems that this looks like I am computing the heterozegozity level and not Pi ?
How can I get the Pi (nucleotide diversity) using this output ? Or am I doing anything wrong in the command before ?
Thanks