Count number of SNPs per chromosome in vcf file
1
1
Entering edit mode
6.3 years ago

Hi all,

I want to count the number of SNP for each chromosome in the raw VCF file. What is the best idea?

Best Regard

Mostafa

SNP • 6.8k views
ADD COMMENT
0
Entering edit mode

Normalize your VCF and then execute: Datamash is in most of the linux repos

$ grep -v '^#' test.vcf | datamash -sg 1 count 1

with awk:

$ awk '!/^#/ { a[$1]++ } END {for (i in a) print i,a[i]}' test.vcf
ADD REPLY
0
Entering edit mode

I have adapted your title to make it more descriptive of what you are asking.

ADD REPLY
10
Entering edit mode
6.3 years ago

UPDATE 2021: if your VCF is indexed: bcftools index -s indexed.vcf.gz

grep -v "^#" in.vcf | cut -f 1 | sort | uniq -c
ADD COMMENT
0
Entering edit mode

many thanks for your reply,

Does -f 1 mean the number of chromosomes?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I apologize but this script does not work?

ADD REPLY
3
Entering edit mode

Pierre's script works for me, Moustafa:

grep -v "^#" test.vcf | cut -f 1 | sort | uniq -c
  16011 1
   7308 10
   9565 11
   9149 12
   3311 13
   5881 14
   5360 15
   7016 16
   8611 17
   2896 18
   9895 19
  11621 2
   3881 20
   2472 21
   3881 22
   9215 3
   7464 4
   7805 5
  10110 6
   7991 7
   6023 8
   6898 9
     37 MT
   3218 X
     21 Y

Chromosome 1 has 16011 variants... chromosome 9 has 6898, et cetera.

Your input VCF should be properly formatted and also be uncompressed.

ADD REPLY
0
Entering edit mode

Yes i understood. Thank you very much for describing you.

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Now, if i want to count the number of SNPs for each Breed, what is the best idea? i have 5 breed in the my raw vcf.

ADD REPLY
0
Entering edit mode

try VCFstats from RTGtools. But that would be stats per sample, not per chromosome. If you want per chromosome, per sample, then you may have to write a script. mostafarafiepour

ADD REPLY

Login before adding your answer.

Traffic: 2481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6