Question

What is the logic behind identifying genes present within 500KB (both side) of a SNP locus ?

1

Entering edit mode

2.1 years ago

anandprem1792 ▴ 60

Can someone please explain me the logic behind identifying genes present within 50KB, 100KB and 500KB (both side) of a SNP locus ? How does the SNP affect the function of the genes present within the above mentioned windows? enter image description here

enter image description here

bedtools GWAS genetics SNP • 688 views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 2.1 years ago by anandprem1792 ▴ 60

score 2 · Answer 1 · 2022-10-10

Several studies plot the density (or effect size) of eQTLs by distance to gene TSS; such as:

enter image description here

As you can see, these desnities are highly peaked around the TSS, but are heavy tailed. The reason to go further than 100kb or 250kb is to capture long-range eQTLs that can arise from chromatin looping -- i.e., bringing opposite sides of topologically associating domains (TADs) into close proximity. (Sensitivity)

A reason not to go further than 100kb or 250kb is power; by incorporating more genes (or more SNPs) into your analysis, with insufficient data you necessarily lose power to detect what you're looking for, and run the risk of false-negatives (and potentially false-positives as well) by diluting true signal. (Specificity)