Hi all,
I want to count the number of SNP for each chromosome in the raw VCF file. What is the best idea?
Best Regard
Mostafa
Hi all,
I want to count the number of SNP for each chromosome in the raw VCF file. What is the best idea?
Best Regard
Mostafa
UPDATE 2021: if your VCF is indexed: bcftools index -s indexed.vcf.gz
grep -v "^#" in.vcf | cut -f 1 | sort | uniq -c
Pierre's script works for me, Moustafa:
grep -v "^#" test.vcf | cut -f 1 | sort | uniq -c
16011 1
7308 10
9565 11
9149 12
3311 13
5881 14
5360 15
7016 16
8611 17
2896 18
9895 19
11621 2
3881 20
2472 21
3881 22
9215 3
7464 4
7805 5
10110 6
7991 7
6023 8
6898 9
37 MT
3218 X
21 Y
Chromosome 1 has 16011 variants... chromosome 9 has 6898, et cetera.
Your input VCF should be properly formatted and also be uncompressed.
try VCFstats
from RTGtools
. But that would be stats per sample, not per chromosome. If you want per chromosome, per sample, then you may have to write a script. mostafarafiepour
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Normalize your VCF and then execute: Datamash is in most of the linux repos
with awk:
I have adapted your title to make it more descriptive of what you are asking.