Question

Other independent methods or ways to confirm potential candidate genes observed through variant calling and homozygosity analysis

0

Entering edit mode

17 months ago

somatohadidas • 0

Hi folks,

I need your invaluable insights and suggestions. I am currently working with some data that relate to recessive lethal Phenotype in an organism. In order to pinpoint the molecular basis of this Phenotype, I worked with tens of WGS samples from this organism. First, I carried out the preprocessing (qc and the likes) and moved on to: mapping and the variant calling with gatk (best practices). The gatk steps was as follows:

Markduplicate -> AddorreplaceReadGroups -> HaplotypeCaller -> CombineGVCFs -> GenotypeGVCFs -> SelectVariants -> VariantsTotable.

After this, I used qtlseqr, plink and dectectRUNs for the run of homozygosity analysis (with hope to detect the variant(s) responsible for the trait). And I successfully got some potential candidate genes.

My question now is this, what are the other INDEPENDENT ways to confirm these gene without going for functional validation, crispr screening and the likes?

This is to corroborate my results from the above pipeline. And to unequivocally proof that there's nothing wrong with any steps of the pipeline.

Thanks in anticipation.

candidate-genes plink Variant-calling Homozygosity-analysis • 1.0k views

ADD COMMENT • link 17 months ago by somatohadidas • 0

score 0 · Answer 1 · 2023-06-12

0

Entering edit mode

17 months ago

LChart 4.6k

This:

(i) what are the other INDEPENDENT ways to confirm these gene

is different from this:

(ii) to unequivocally proof that there's nothing wrong with any steps of the pipeline.

The answer to (ii) can be achieved in a straightforward way: download any dataset with established genotype calls (ideally for your organism but not necessary); run your pipeline, compare variant and genotype calls. You should expect high (~97+%) recall and high (95+%) non-reference genotype concordance.

The answer to (i) depends greatly on the extent of research already performed, the organism, and the phenotype. Are the genes part of established pathways relevant to the trait? Do existing linkage analyses implicate the gene at a nominal level? Is there evidence from differential expression?

ADD COMMENT • link 17 months ago by LChart 4.6k

0

Entering edit mode

Thank you so much for your suggestion, and sorry for the late response. I had planned on carrying your first suggestion, but the person with the data I wanted to use is currently on a journey and not currently reachable—at least till next week.

Regarding the questions you posed in the second paragraph, yes, the genes are part of established pathways relevant to the trait. The genes are orthologs of well-annotated genes in drosophila (the organism I am dealing with is a fly) with the same phenotype. Only one was found from the differential expression analysis.

ADD REPLY • link 17 months ago by somatohadidas • 0

score 0 · Answer 2 · 2023-06-12

It depends on how you want to validate it ...

Scenario 1: You want to prove the variant exists.

Here, your best bet is to do Sanger sequencing of that region. This will not prove to you that your pipeline is perfect, as LChart said, but it would be very very strong evidence that the variant exists. However, it tells you nothing about whether the variant does anything. Sanger sequencing is cheap and fast and easy. LChart already described additional in silico validation, above, so I will leave those out.

Scenario 2: You are pretty sure the variant exists, so you to provide evidence it has an effect.

First, you need to know what kind of effect you expect the variant to have. For example, a nonsense or frameshift mutation will truncate/alter the protein, therefore they almost always result in a loss of function. For missense variants, you can look at what the original amino acid was, what the new amino acid is, and where it is in the protein to infer what effect it may be having. There are also sophisticated in silico tools that will do this.

Once you know what type of variant you have (loss of function, gain of function, affects expression, etc.) you want to see if other variants of the same kind in the same gene produce a phenotype similar to what you are seeing, then seeing what kind of experiments they did to prove that.

The tools that you can use to do this depend on what the organism it is. If it is a human genome, you could use a well annotated database like Clinvar. If you think your variant is loss of function (LoF), and other LoF variants have been shown to produce a phenotype similar to the one you see, that kind of evidence is regarded as strong enough to be used by doctors to treat patients. In other words, while it's not infallible, it is regarded as fairly strongly confirmatory - and importantly its also FREE to do that.

If its an organism about which not as much is known, you can still take a related approach. For instance, suppose the gene has high identity with a well known gene in humans, mice, drosophila, c elegans, etc. If similar "types of" variants in other species do something similar, it doesn't prove anything, but it is a good place to start to see if you want to spend time and/or money doing more work.

Scenario 3: You want to first confirm the variant exists, then prove it has an effect.

Essentially, for this, you follow the steps for scenario 1 and 2, then you design custom functional assays to show it sounds like you don't need to do the proving part, so I'll leave that out.