when counting aligned reads that map to a gene (the exons), how do you compute the raw expression? Does an aligned read need to only overlap an exon to be counted or does it need to map entirely inside an exon?
It probably depends on the biological questions you're asking.
If you have reads aligned with a spliced aligner, you can use htseq-count to address questions at the gene level since it will count reads that partially overlap exons according to rules described on the page you linked to. Those rules are also discussed here in the context of "how much do you believe your annotations", which might be helpful to you in deciding what kind of counting to perform.
This will work even if you don't have reads from a spliced aligner. Keep in mind that the annotations you use (e.g., full genes vs just exons) can influence the results. Once you have the counts for each gene you can normalize to gene length and library size to get an RPKM-like value, which will be correlated with "raw expression".
However, if you would like to address differential isoform expression, then htseq-count might not be what you want since the way it deals with ambiguous or multimapping reads is to ignore them. Instead, check out Cufflinks or Scripture which put a lot of effort into assigning initially ambiguous reads to specific isoforms. These tools also perform the normalization so that results can be interpreted directly as expression.
The counts will have to include reads that map to the exons plus reads that map to the exon junctions. Then you need to normalize that count to the length of the transcript and you'll end up with a value that correlates with the relative expression level of the transcript (relative to the other expression levels).
The tool that you link to most likely cannot do this.
If you're seeing exon-intron junctions, perhaps you are observing intron-retention, an interesting phenomenon related to regulation (alternatively, it could be some contaminating genomic DNA). If you want to quantify the effect, i.e. count events that correspond to defined structures, then you have to count any reads that map cleanly to those structures. If you have a lot of reads that map outside the bounds of your structures, then I'd say either your structures are not sufficient, or your quantification of them will be a little bit fishy if you allow those reads into the expression count.
Just to be clear, the normalization by length of transcript is orthogonal to the counting. Whether or not normalizing by gene length is important depends on the applications downstream of the counting. In some cases, normalizing by gene length is counterproductive.
Are you using an aligner that can deal with splicing? If so, that will need to factor into the answer.