What I don't understand, is how this method doesn't simply pull out the phylogenetic tree. Why doesn't shared evolutionary history bias this procedure?
It seems to me that covariance is a originally determined by the thermodynamics governing protein/peptide folding and those covariances are propagated through evolutionary descent. The questions being asked of SCA and evolutionary history have some overlap.
SCA may pick up on features missed by evolutionary history. For example, imaging that a pair of mutations in a hemoglobin protein confer resistance to a mammalian parasite. That pair of mutations may have occurred in a common ancestor of all mammals, in which case SCA will produce the same result as the phylogenetic tree. But it may also have arisen later, in multiple species, and propagated from there. Phylogeny may not identify covariance in this sort of feature while SCA might.
Shared evolutionary history will probably bias the answers of SCA, but that bias doesn't necessarily invalidate the results. (Sometimes bias makes it easier to get the right answer. Bias isn't guaranteed to be bad.) If a multiple-feature covariance has propagated through many generations of many species, is it no longer interesting?
In the following article, the authors try to derive residue contacts from correlated columns in multiple alignments. They use Mutual Information (+ a direct coupling measure), not statistical coupling. But this can helps you to find an answer.