Question

Interpret genome alignment results

0

Entering edit mode

5.1 years ago

el97004 ▴ 80

Hi all!

I assembled two different genomes and wanted to see how similar they are on both nucleotide and protein levels so I aligned their nucleotide and translated nucleotide sequences. Here are the results I obtained:

Nucleotide identity=90% Protein identity=57%

How would one make sense of this high nucleotide yet low protein identity result? I have been doing a lot of reading and it seems that if the species are close its better to use the DNA sequence to compare, and I believe these two species should be fairly close. However, I am still confused as to why the values would differ so much.

Thanks for your input!

alignment protein nucleotide • 1.4k views

ADD COMMENT • link updated 5.1 years ago by michael.ante ★ 3.9k • written 5.1 years ago by el97004 ▴ 80

0

Entering edit mode

There are lots of reasons for this, and all else being equal this is to be expected.

You need to clarify whether these are DNA sequences of genes or the whole genome etc.

ADD REPLY • link 5.1 years ago by Joe 21k

0

Entering edit mode

Sorry I should have clarified. Whole genomes!

ADD REPLY • link 5.1 years ago by el97004 ▴ 80

1

Entering edit mode

It doesn't make any sense to translate the whole genome, and consequently even less to align/compare them.

ADD REPLY • link 5.1 years ago by Joe 21k

1

Entering edit mode

Exactly!! Only translate and compare protein-coding regions. For non-coding regions, DNA similarity can be high but when ERRONEOUSLY translated, the "protein" sequences could be from different frames and therefore very low similarity. Again, only translate and compare protein-coding regions.

ADD REPLY • link 5.1 years ago by Cupton ▴ 80

score 1 · Answer 1 · 2019-10-19

1

Entering edit mode

5.1 years ago

michael.ante ★ 3.9k

Hi,

Little changes on nucleotide level can lead to drastic changes on protein level. In a worst case scenario, you might introduce a frame shift with a mutation in a gene's 5' region which lead to a totally different products. You'll have in such a case nearly 100%identity on nucleotide level but nearly none for the protein.

Depending on your species you have more or less "junk DNA" intergenic region, introns, etc. These non-coding regions can increase the overall nucleotide identity, but not that of the proteins.

Cheers,

Michael

ADD COMMENT • link 5.1 years ago by michael.ante ★ 3.9k

0

Entering edit mode

Thank you that makes sense. But how about in less extreme case scenarios, for example if the third codon in the DNA is mutated it could have no affect on the protein sequence

ADD REPLY • link 5.1 years ago by el97004 ▴ 80