Question

comparing annotations between genomes.

3

Entering edit mode

7.0 years ago

Ric ▴ 440

Hi, I used flo which did a mapping of annotations from one genome assembly to another. The flo developer did on their page the following calculation:

For an ant genome (~350 Mb) we saw 90% annotations map identically to the new assembly (unpublished result).

How did they calculate the above percentage?

Thank you in advance.

annotation gff liftOver CrossMap • 2.1k views

ADD COMMENT • link updated 4.1 years ago by Priyam ▴ 20 • written 7.0 years ago by Ric ▴ 440

0

Entering edit mode

Thank you, but I still not sure which files should have I to use?

ls
input.cds.fa  input.gff  lifted_cleaned.cds.fa  lifted_cleaned.gff  lifted.gff3  unlifted.gff3  unmapped.txt

> grep "ID=" unlifted.gff3 | wc -l
19233
> grep "ID=" lifted_cleaned.gff | wc -l
33639
> wc -l unmapped.txt 
43632 unmapped.txt
> grep "ID=" input.gff | wc -l
45857
> python 
>>> float(33639*100)/45857
73.35630329066446

Do you think that I choose the correct ones?

Thank you in advance

ADD REPLY • link 6.9 years ago by Ric ▴ 440

score 2 · Answer 1 · 2020-11-10

Coding sequences of the lifted gene models were obtained and checked if they were exactly identical to the coding sequence of the corresponding input gene model. I do not remember if I reported that number as percentage of gene models that were lifted, or as a percentage of input gene models. We can be more confident that a gene model was lifted correctly if the coding sequences of input and lifted gene models are exactly identical. If the coding sequences are not identical, the lifted gene model may still be correct but there is a higher chance we mapped to a duplicate. I believe gff_compare.rb script in flo gives you the id of gene models that were either not lifted or had a non-identical coding sequence.

score 0 · Answer 2 · 2017-12-20

0

Entering edit mode

7.0 years ago

Philipp Bayer 8.8k

Flo gives you a file of unlifted genes - those without mappings, those that had to be split up, etc. I guess they took the number of those unlifted genes and subtracted them from the original number of genes?

ADD COMMENT • link 7.0 years ago by Philipp Bayer 8.8k