Please excuse how basic this question is; I'm a bioinformatics newbie. I'm looking at the 7-way primate genome alignment in Ensembl release 76 (ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_7_primate), and I don't understand how it decides which bases should be (soft-)masked. For example, in the first file (chr1_1.emf), around line 250 there is a run of T's followed by some other bases that is consistently masked for some genomes and not others, even though the sequences are identical. What's going on here?
Odds are good that that comes originally from Repeatmasker, in which case differences in the repeat databases used for each organism could cause what you're seeing (obviously, it would take someone from Ensembl to give you the real answer).
The columns you are referring to are from the predicted ancestral sequences and we (Ensembl) don't repeat-mask ancestral sequences. In the EMF file, each column is a species, as indicated in the header SEQ elements. You can see at the beginning of the file that extant species are mixed with ancestral ones.
Odds are good that that comes originally from Repeatmasker, in which case differences in the repeat databases used for each organism could cause what you're seeing (obviously, it would take someone from Ensembl to give you the real answer).