I tried to find more details about the empirical quality score(more specifically, figure 3 in the paper: http://www.nature.com/ng/journal/v43/n5/abs/ng.806.html used in the comparison provided in the GATK paper on quality score recalibration, but I could not find it. I wonder someone can give me some ideas on how are empirical quality score caluculated and obtained? As I understand, there is sequencing bias in base pair composition and read length, therefore the recalibrated quality score should be better. Still, it will be great if someone can explain or elaborate on the concept of empirical quality score a little bit more.
Have you looked at the methods section of the paper that you are referring to? It seems to describe the mathematical background of the base quality recalibration. I'm sorry I can't give you a better answer than that - my own understanding of the procedure doesn't extend further than that.