Hi,
I have genomes (fasta files) of two species and the corresponding GFF3 files. I need to identify a. genes that are unique (< 30% similarity) to each species, b. highly conserved and c. highly diverged. Can you please suggest how I can do that?
NB: these are eukaryotic genomes of ~15-16 Mb and contain very few introns.
Thank you for your help!
I would just want to add a word of caution and that is that defining "species specific" is tricky business. You certainly need to keep in mind that that is only valid given the current context.
top tip: do the effort of checking whether genes are not missed in either annotation (quick tblastn will tell you so)
The answer given by h.mon is likely the best approach to start with and see what you get