Hi all,
I have a VCF file that containing 50 samples, i want to count the number of SNPs. My organism is non-model, So it does not have the chromosome.
Now, How can i count the number of SNPs for all 50 samples with this VCF?
Best Regard
Mostafa
Hi all,
I have a VCF file that containing 50 samples, i want to count the number of SNPs. My organism is non-model, So it does not have the chromosome.
Now, How can i count the number of SNPs for all 50 samples with this VCF?
Best Regard
Mostafa
Hello,
the total number you get by counting the lines in the vcf excluding the header lines.
$ grep -v "^#" input.vcf|wc -l
fin swimmer
Please describe the details in the photo for me.
how is it related to your original question ? Are you sure you're using the correct terms ?
Each line of the VCF is a VARIANT. A Variant can be a SNP or an INDEL or etc...
The intersection of the Variant and the Samples' names is a GENOTYPE.
Hello mostafarafiepour,
you've started with one question. In the meantime there are three :)
1. How to read a vcf file
This is a very basic question. So you need starting some literature:
If you doesn't understand any of the explanations, don't worry to ask.
2. How to count the variants in a vcf (your original question)
3. Is the resulted number of (2) correct?
Well, that's quite hard to say without knowing anything about your genome. How large is it? Is there a high diversity between individuals? As we just have the total number of different variants in all of your samples, it might be better to get a per sample count. The output of bcftool stats
(as suggested by cpad0112 ) might be useful or have a look at this thread, especially the answers by Pierre and me.
fin swimmer
Here is a quick way to count biallelic SNPs in vcf.gz files (use "cat" instead of "zcat" for uncompressed vcf files):
zcat input.vcf.gz | awk '{if ($4~/^[ACGT]$/ && $5~/^[ACGT]$/){c++}} END {print c}'
If all your variants in the vcf are SNPS, then a very quick way is to first index and then index again with the -n flag.
bcftools index data.vcf
bcftools index -n data.vcf
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
run
bcftools stats
on vcf. It would summarize the VCF with most of the details you need.try this
bcftools query -f '%POS\n' file.vcf.gz | wc -l