Hello all,
I am working on aligning proteins orthologs from different species. I am using the Ensembl API. Strangely, some protein sequences from non-human species have a lot of X. I wonder what does that mean? In theory, if their genome sequence is know, the protein sequence should be known, right? How do I score these X when I calculate the conservation scores? Thanks a lot. An example is shown below : ENSMEUP00000002410 from Notamacropus Eugenii.
MGLSGAAGAAVLVLLAGHFSLGTALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQKNYDLSFLKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXILVGGVRFNNNPTLCNVETIQWKDIVGSAYVSNITIDNNSHPKSXXXXXXXXXXXXXXXXXXXXXXXXTKTICAQQCSGRCRGSSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVRKCPHNYVVTDHGSCVRSCNAETYEVEEDGVRKCKKCEGPCSKVCNGIGIGEFKDVLSINATNIKQFQNCTTISGDLHILPVAFKGDSFTNTPPLDPKELNILRTVKEISGFLLIQAWPENMTDLHAFEHLEIIRGRTKQHGQFSLAVVGVDITSLGLRSLKEISDGDVIISKNRQLCYANTINWSKLFGTRSQKTKITNNKDEKECRALGHVCHELCSSDGCWGPSSSHCLSCRYVSRQKKCVEKCNILEGEPREYMENLKCLQCHPECLPQLMNQTCTGPGPDKCVQCAHYIDGPHCVKTCPAGIMGEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPKIPSIATGIVGGFLLLMVLVLGIGLFIRRRRIVRKRTLRRLLQEREXXXXXXLSPPSGEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLITQLMPFGCLLDYIREHKDNIGSQYLLNWCVQIAKGMSYLEERRLVHRDLAARNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSVLEKGERLPQPPICTIDVYMIMVKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSATSNTSATVCIDRNGQQTCPVKEESFIQRYSSDPTTVLLEDNVDDSFQPVP
ENSMEUP00000002410
identifier seems to be pulling up Tammar wallaby entries.