Hi,
I am not from a population genetics background, but am doing a feasibility test using a few organisms for a bigger aim of determining allele specific expression (ASE). I am performing variant calling on DNA-seq samples from 4 water buffaloes (Bubalus bubalis) to determine heterozygous sites in each samples using GATK 4.0.0.0. I also have RNA-seq data of the 4 animals that will allow me to take my ASE project forward.
How important is hets value (heterozygosity)? I have read- https://gatkforums.broadinstitute.org/gatk/discussion/8603/heterozygosity. Is using the default human heterozygosity value of 0.001 is correct for my non-model organism?
I also came across this website: https://gatkforums.broadinstitute.org/gatk/discussion/8603/heterozygosity
How can I determine heterozygosity for my species? Where or What should I look into to know about this value for my non-model organism?
An easy explanation will be appreciated as this would help me to understand the weightage of this parameter.
I do not know anything specific about Bubalus bubalis but the question to ask here is how you are performing the variant calls. I assume there's a reference genome? Are there multiple. An important question is whether previous studies generating Bubalus bubalis genomes have found any statistics on heterozygosity. You may also want to look into heterozygosity of other members of the Bovine family (i.e. cows.) I am not an expert but I would assume heterozygosity between family levels is probably fairly similar.
I would guess there is a much larger body of evidence generated for cows as opposed to water buffalo.
Hi Dylan..there is only one reference genome..and I am using that for variant calling..there is a recent paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0185220
In table 1, there is some data on heterozygosity, is it what I should look into? If yes, How should I interpret it?
I can check cows for such value..
I am not entirely sure how to interpret that particular paper's SNP frequencies but you may be able to compare that paper to this paper on human heterozygosity and get a good benchmark; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586588/
There also may not be a consensus in your field and the best option may be to perform your pipeline on a variety of heterozygosity levels.