Entering edit mode
6.6 years ago
QVINTVS_FABIVS_MAXIMVS
★
2.6k
Say I have a large (N samples > 2000) VCF or plink bed file.
What's the quickest way to calculate the number of alleles (unique alleles and # ALT genotypes) for each sample?
What are the options that can quickly digest a 1Tb VCF (broken by chrom)?
Plink is ridiculously fast for this, but I don't think it can perform a per-sample count of variants
in the title you want to count the number of snp per individual, in the body you want the number of allele for each sample. Please, show us the expected output.
Either
Or
so the first file would output
ID1 1
for1/1
genotype but the second file you haveID1 1 2