Calculating Heterozygosity for each SNPs.
1
0
Entering edit mode
5.9 years ago

Hi all dear,

I want to calculating heterozygosity for each SNPs. After studying the plink guide, I have calculated the heterozygosity using the following script.

plink  --make-bed --file purebred411_qc --freqx --out freqx_411

And I got this output:

CHR SNP         A1  A2  C(HOM A1)   C(HET)  C(HOM A2)   C(HAP A1)   C(HAP A2)   C(MISSING)
1   AX-85111653 3   1   45           187     178         0          0            1
1   AX-85043398 2   4   45           186     180         0          0            0
1   AX-85051079 4   2   5            71      335         0          0            0
1   AX-85154093 4   2   5            72      332         0          0            2
1   AX-85063459 3   1   56           199     155         0          0            1

So, First, I want to know if I correctly calculated the heterozygosity value for each SNPs?

Second, if done correctly, how can I calculate the percentage of heterozygosity of each SNPs?

Best Regard

Mostafa

SNP • 4.7k views
ADD COMMENT
1
Entering edit mode
5.9 years ago

Hello Mostafa,

Here is what the plink manual states:

Allele frequency

  • --freq < counts | case-control > < gz >
  • --freqx <gz> (alias: --frqx)

By itself, --freq writes a minor allele frequency report to plink.frq. If you add the 'counts' modifier, an allele count report is written to plink.frq.count instead. Alternatively, you can use --freq with --within/--family to write a cluster-stratified frequency report to plink.frq.strat, or use the 'case-control' modifier to write a case/control phenotype-stratified report to plink.frq.cc.

--freqx writes a more informative genotype count report to plink.frqx.

For both flags, gzipped output can be requested with the 'gz' modifier.

Nonfounders are normally excluded from these counts/frequencies; use --nonfounders to change this.

All of these reports (except for --freq + --within/--family) are valid input for --read-freq; --freqx is the most powerful when used in that capacity, since it preserves deviation from Hardy-Weinberg equilibrium.

[source: https://www.cog-genomics.org/plink/1.9/basic_stats#freq]

----------------------------------------------------

You used --freqx. Here is a description of the output:

.frqx (genotype count report)

Produced by --freqx. Valid input for --read-freq.

A text file with a header line, and then one line per variant with the following ten fields:

  • CHR Chromosome code
  • SNP Variant identifier
  • A1 Allele 1 (usually minor)
  • A2 Allele 2 (usually major)
  • C(HOM A1) A1 homozygote count
  • C(HET) Heterozygote count
  • C(HOM A2) A2 homozygote count
  • C(HAP A1) Haploid A1 count (includes male X chromosome)
  • C(HAP A2) Haploid A2 count
  • C(MISSING) Missing genotype count

[source: https://www.cog-genomics.org/plink/1.9/formats#frqx]

----------------------------------------------------

Final piece of information: it looks like your bases are encoded in 1,2,3,4 format (A,C,G,T == 1,2,3,4).

So, now you should understand your output and, I believe, you will know whether or not you have chosen the correct program / command.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6