How to calculate the number of SNPs in each sample in a multi vcf file
1
1
Entering edit mode
3.2 years ago
Kash ▴ 110

Hi everyone,

I have a multi vcf file with 23 samples generated from gatk pipeline. Now I want to find how many SNPs are in each sample (sum of 1/1,0/1 SNPs for each sample) Can someone please tell me how to do this?

Thank you

vcf gatk multi SNP • 3.7k views
ADD COMMENT
3
Entering edit mode
3.2 years ago

bcftools stats --samples '-' in.vcf.gz

ADD COMMENT
0
Entering edit mode

Thank you Pierre.

Following is a potion of the output produced by bcftools stats for my muti vcf file. The sum of column [5]nNonRefHom and [6]nHets gives the number of SNPs in each sample.

I wonder whether the [10]average depth value is equal to the average depth of coverage we will find in the bam files of the respective samples. Please let me know if you have an idea about the column [10]average depth

PSC, Per-sample counts. Note that the ref/het/hom counts include only SNPs, for indels see PSI. The rest include both SNPs and indels.

# PSC   [2]id   [3]sample       [4]nRefHom      [5]nNonRefHom   [6]nHets        [7]nTransitions [8]nTransversions       [9]nIndels     [10]average depth     [11]nSingletons [12]nHapRef     [13]nHapAlt     [14]nMissing

PSC     0       3517    72867328        1946700 3767477 3734562 1897030 732106  20.7    340848  0       0       336589 
PSC     0       3519    72781946        1910019 3885015 3792473 1919999 737150  22.6   351755  0       0       336253 
PSC     0       683610  74080947        1261182 3399064 3048662 1552180 569521  17.9    107872  0       0      346298 
PSC     0       686521  74340225        1224973 3245773 2918053 1489168 556105  18.5    95907   0       0       288281
ADD REPLY

Login before adding your answer.

Traffic: 1596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6