If you test for say 100 000 SNPs in a SNP array for some phenotype, my understanding is that you would correct for 100 000 observations. What about more in depth studies. Do people correct for all >3 million observations in whole genome sequencing (ie nucleotides in hg19), presuming there is no epistasis? Or do they correct based on the number of known variants or some other smaller number?
so common variants, or non-damaging variants are filtered out pre-statistics?
non-damaging: yes (unless they are rare and in a location where functional characterization isn't obvious, such as non-coding regions)
common: depends
I like to check for variants in the GWAS catalog, but the users that I have interacted with haven't typically been very interested in those results.
A sufficiently large SNP array will probably cover the most common variants. Unless you are working with a common disease that hasn't been studied with SNP arrays for some reason, I think you can assume that low hanging fruit has been identified (although I think replication from other studies is worth noting).