Hi all,
I need to find out the percentage of identity between every pair of orthologous genes in 4 different (but closely related) bacteria.
The dataset I have is the nucleotide sequences of each gene (I mean, ORFs) in the genomes and the information on which gene is orthologous to which (based on OrthoMCL result). There are ~1500 orthologous groups, so at the end I hope to have ~1500 tables which show percentage of identity among the genes in each group. Well, even better is to have ~1500 identity percentage ranges since these are what I'm really after.
Is there a software to do this? (Sorry but I haven't searched for it myself since I don't even know what to search for.)
If such software doesn't exist, I'm thinking to build one myself since I'm learning Python. Any suggestion for that? I'm thinking to use global alignment algorithm like Needleman–Wunsch's, and preferably using Windows (since this is the only available option for me; but please don't hesitate to answer if you have a Linux solution).
Thank you.
(Edited to explain OS choice)
It is hard for me to imagine that Windows is the only available option since Linux is free. Since you seem to want to do some more bioinformatics work in the future, I will restate what has already been said elsewhere: Learn Linux, Python, and Bash thoroughly. This is simply essential. It is a long process, but everything you learn along the way is directly usable and useful. START NOW! ;)