How does Ensembl decide what to mask?
1
0
Entering edit mode
10.1 years ago
dbweissman ▴ 10

Please excuse how basic this question is; I'm a bioinformatics newbie. I'm looking at the 7-way primate genome alignment in Ensembl release 76 (ftp://ftp.ensembl.org/pub/release-76/emf/ensembl-compara/epo_7_primate), and I don't understand how it decides which bases should be (soft-)masked. For example, in the first file (chr1_1.emf), around line 250 there is a run of T's followed by some other bases that is consistently masked for some genomes and not others, even though the sequences are identical. What's going on here?

masking Ensembl sequence genome alignment • 2.2k views
ADD COMMENT
0
Entering edit mode

Odds are good that that comes originally from Repeatmasker, in which case differences in the repeat databases used for each organism could cause what you're seeing (obviously, it would take someone from Ensembl to give you the real answer).

ADD REPLY
1
Entering edit mode
10.1 years ago
Denise CS ★ 5.2k

The columns you are referring to are from the predicted ancestral sequences and we (Ensembl) don't repeat-mask ancestral sequences. In the EMF file, each column is a species, as indicated in the header SEQ elements. You can see at the beginning of the file that extant species are mixed with ancestral ones.

ADD COMMENT
0
Entering edit mode

Ha, I should have noticed that the unmasked ones were all the ancestral sequences... Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2121 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6