Hi,
As usual, all depend of which question you to answer.
I you want an idea of the global quantity of repeat for your genome, a quick annotation with RepeatMasker/Repbase can be enough. Or there a public annotation layer already available ?
But in your case, because you are searching for a link between binding sites and Transposable Element (TE) presence/absence, you should go for a deeper annotation of repeats. Maybe a TE database dedicated to your specie exists and could be used with TEannot (REPET pipeline) or RepeatMasker to obtain a better annotation.
If the available databases are too far from the specie you are analysing or if no data are available you should go for a de novo detection/annotation of repeats. This is a big task, but if you want to be exhaustive, you have no choice.
what kind of normalization one must do to account for the large sequence length variation between different repeat families. for example, a sine element could give many hits on the genome just because it is relatively short, whereas an LTR may not
Your are wrong on this on point. Unless your are SINE copies are less than log4(N) + 1 base pairs (where N is your genome size in base pairs), these copies are real and not issues from random. So you should not under estimate their importance in your analysis. Moreover, if it's the case that would mean the annotation had very bad quality.
so how do you validate your annotations in the end?
You could validate your annotation through the validation of consensus by checking if each consensus you used have at least 3 complete copies in the genome. But you also have to be aware that TE could derives very fast and so a lot of degraded copies of the original TE are also present in the genome. That I why prefer use several consensus (1 for each main degraded copies), TE models, for describing and annotate the whole diversity of TEs.
Anyway this is a very large subject with a lot of debate.
Here a sample of publications for discovering the beautiful world of TEs and their annotation :
Hi! Have you tried RepeatMasker? http://www.repeatmasker.org/