Count SNPs per individual FAST!
1
0
Entering edit mode
6.6 years ago

Say I have a large (N samples > 2000) VCF or plink bed file.

What's the quickest way to calculate the number of alleles (unique alleles and # ALT genotypes) for each sample?

What are the options that can quickly digest a 1Tb VCF (broken by chrom)?

Plink is ridiculously fast for this, but I don't think it can perform a per-sample count of variants

plink SNP gwas • 2.8k views
ADD COMMENT
0
Entering edit mode

in the title you want to count the number of snp per individual, in the body you want the number of allele for each sample. Please, show us the expected output.

ADD REPLY
0
Entering edit mode

Either

IID    #UNIQUE_ALLELES

Or

IID   #UNIQUE_ALLELES    #ALT_ALLELES

so the first file would output ID1 1 for 1/1 genotype but the second file you have ID1 1 2

ADD REPLY
1
Entering edit mode
6.6 years ago

plink --score can be abused for this purpose.

  • Create an input file assigning weight 1 to every alt allele to get #ALT_ALLELES.
  • You can then repeat the --score computation after erasing all the heterozygous calls ("plink --set-hh-missing --chr-set -26 --make-bed"; it may be necessary to use "--output-chr 26 --make-bed" first to force numeric chromosome codes). You should be able to infer #UNIQUE_ALLELES once you have both --score output files.
ADD COMMENT

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6