Hi all
I'm using VCFtools (v0.1.17) for estimating nucleotide diversity of my study species.
I already got a VCF file which was made form mapping to a draft genome, then I used it to calculate pi value.
As you can see, the output showed the bin size and variants(here, I used --window-pi 60000 --window-pi-step 24000), pi value = numbers of variants/Bin size
CHROM BIN_START BIN_END N_VARIANTS PI
scaffold22988 1 60000 11 0.000183333
The problem is that scaffold22988 has only 1015 bp, but it used total bin size for estimating pi, instead of the length of that scaffold. This makes the average pi value across genome under estimated when large bin size was applied.
This situation also happened on the end of large scaffold, like:
CHROM BIN_START BIN_END N_VARIANTS PI
scaffold14 18960001 19020000 14 0.000233333
Scaffold14 in fact has only 18,967,204 bp. So again, the pi value of the last window of this scaffold was underestimated (The bin size should be 18967204-18960001+1=7204 here).
I want to ask is there any methods that can specify the program not to over estimate bin size? I've been read on the manual of VCFtools, but did not see any similar function.
Will be grateful for any suggestions.