Hello,
I have a replication data of 12 markers and unrelated ~2300 cases + ~1200 controls. I want to check the heterozygosity rate to see if I should exclude any individuals out. I used plink
plink --bfile QC_file --het --out QC_het
and then calculated the rate using (N(NM) - O(Hom))/N(NM
).
With this the heterozygosity range is from 0 to 0.23.
Heterozygosity rate 0 0.05 0.1 0.2
No of individuals 2924 553 33 1
The inbreeding coefficent rate is mostly 1 and for some its negative value.
I am not sure how to interpret this or filter individuals based on this range ? Any help is appreciated.
Many thanks
Yes, the final column, "F" is mostly 1 and some are negative.
Sorry, then my answer would be the other way around, meaning that almost all of the samples are heterogeneous (This population is substantial heterogeneous and may include a number of ethnic groups). O/w the samples are contaminated.
The samples are from 3 different countries, for the analysis the samples are corrected for geographic origin but I am just not sure if its sensible to remove individuals or to just keep them considering the dataset is really small.
What does the negative value of F indicate?