Here is my take on it:
The short answer is: yes; you can define a hard-threshold to binarize your methylation data and as far as I know, the majority of the methylation-related papers do this.
The long answer is: yes; you can define a threshold, but you should do this in a way that helps you explain your phenotype of interest, e.g. gene expression. In this sense, it is also important to know whether you want to work with probe-level or gene-level data.
Let's say you are working with probe-level data; then people are, most of the time, interested in the effect of methylation on transcript levels and this requires you to identify which probe is more informative for you for a given gene and what seems to be the best cut-off for the B-value (beta) that distinguishes the samples (from the normal ones) that have down-regulation in that gene -- and this threshold might be different for each gene (depending on the coverage, promoter sensitivity, CG content of that region, etc.). For example, you sometimes see hyper-methylated promoter regions (B ~ 1) for a gene that do not really show a differential regulation at all. In these cases, would it make sense to threshold the methylation data and call these probes/genes methylated? It depends on what you want to accomplish with your binary data.
I think whatever approach you use will be good as the field does not have a standard way to do things -- everybody seems to be going in his/her way nowadays. As long as you are aware of the artifact you might have in your pipeline, I think the simple binary approach might be the easiest to go, but it is not necessarily the best in terms of explaining biological mechanisms and phenotypic effects.
Oh and you might find the following TCGA guideline useful: https://confluence.broadinstitute.org/display/GDAC/Methylation+Preprocessor
I would like to distinguish the effect of methylation on gene expression of each specific gene. So, yes, probe-level data binarization might be the best way to go. (Binarize the methylation data for each gene differently.) If I were to do this, what ways might be best for picking the threshold for each gene?
In that case, I think you don't need to binarize the methylation data at all. You can simply try to correlate the B value for a probe to the gene expression level of interest. As described in the TCGA guideline above, you should take the most anti-correlated one and when you do these for all gene expression vs corresponding methylation probe levels, you can then decide on the effect of methylation by looking at these correlation values.