Hi, I am using Ka/Ks calculator software for identification of rapidly evolving genes. I align my gene sequences with muscle software and then after converting to axt format used this tool. I am surprised that it work well with some alignment and for others this give me an error "Error. The size of two sequences in 'ID is not equal." I am not able to understand this error. So if anybody has the idea please provide the help.
Thanks
Deepak
Input file having Error
>ENSBTAT00000025915.4 ensembl:known_by_projection chromosome:UMD3.1:10:89053688:89074011:-1 gene:ENSBTAG00000019454.4 gene_biotype:protein_coding transcript_biotype:protein_coding
ATGA-----TTGC-------------------------------------------------------------------------------------------------------------------------GTCGTGT----------------------------------------------------------------------------------------------------------------------CTGTGTTA---CCTGCTGCTGCCGGCCGC-----------------GCGCCTTTTCC-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GCGCCCTCT----------------------------------------------------------------------------------------------------------------------------------CAGATGCTTTCTTCACATGTCGGAAAAATGTCCTCCTGGCGAAGAGCTCGTCCTCCCAGGTAGAAGGCAACTTTGCCATGGCCCCTCGGGGCCCCGACCAGGAGGAGTGTGAGGGCCTGCTGCAGCAGTGGAGGGAAGAAGGGTCGAGCCAGGTGCTGTCAACTGTGAGCGACGGTCCCCTTGTAGATAAGGGACTCGCCGAGAGCAGCCTGGCCCTCCTGATGGATAATCCCGGAGAACAGGATGCTGCTCCGGAGGACACGTGGTCCAGCAGGCAGCTGAGTGACCTGCGGGCAGCGGAGAACCTGGAGGAGCCTTTTCCCGAGGTGCTAGGAGAGGAGGAGCCGCTGCCGGAGGTCGAGGGCCCAATGTGGGCAGCAGTGCCTGTGCAGACCGGCCGCCAGTACACAGATTGTGCTGTCCTCCCTATGGGTGCGCTGGCCACAGAGCAGTGGGACGAGGACCCCGCGGTGGTGGCCTGGAGCATCGCACCGGAGCCTGTGCCCCAGGAAGAGGCTCCCGTCTGGCCCTTTGAGGGTCTGGGGCAGCTGCAGCCTCCCCCAGTGGAAATCCCGTATCACGAAATCTTGTGGCGAGAATGGGAGGATTTCTCCACTCAGCCAGATGTTCAGGGCCTGGAGGCAGGGGATGGCCCTCAGTTCCAGTTCACTCTGATGTCCTATAATATCCTGGCCCAGGACCTAATGCAGCAGAGCTCCGAGCTCTATCTGCATTGCCACCCAGACATCCTGAACTGGAGCTATCGCTTTGCGAATCTCATGCAGGAATTCCAGCACTGGGACCCGGACATCTTGTGTCTCCAGGAAGTCCAGGAAGATCATTACTGGGAGCAGCTGGAGCCCTCTCTGAGAATGATGGGCTTTACCTGTTTCTACAAGAGGAGGACCGGGTGTAAGACAGATGGCTGTGCTGTCTGCTACAAGCCCACGAGATTCCGTCTGCTCTGCGCCAGCCCCGTGGAGTACTTTCGGCCTGGCTTGGAGCTCCTCAATCGGGACAACGTGGGCTTAGTGTTGCTGCTGCAGCCACTGGTCCCAGAAGGCCTGGGGCAAGTCTCGGTGGCCCCCTTATGTGTGGCAAATACCCACGTCCTGTACAACCCACGGCGGGGCGACGTCAAGCTGGCCCAGATGGCCATTCTCCTGGCTGAAGTGGACAAGGTGGCCAGGCTGTCAGATGGCAGCCACTGCCCCATTGTCCTGTGTGGGGACCTGAACTCTGTCCCCGACTCGCCTCTCTACAACTTCATCAGGGACGGGGAGCTCCAGTACCACGGGATGCCAGCCTGGAAGGTATCTGGACAGGAAGACTTCTCCCATCAGCTTTATCAGAGGAAGCTGCAGGCCCCACTGTGGCCCAGCTCCCTGGGTATCACTGACTACTGTCAGTATGTCACCTCCTGTCACCCCACGAGCTCAGAGAGACGCAAGTATAGCCGAGACTTCCTGCTGCGTTTCCGCTTCTGCAGCATGGCCTGCCGGCGACCTGTGGGACTGGTTCTTCTGGAAGGAGTGACAGACACTAAGCCAGAGCGACCTGCTGGCTGGGCTGAGTCTGTCATTGAGGAAGATACATCTGAGTCTGAGCCGGATGTCCCCAGGACTGCAGGCACCATCCAGCACTGCCTACACCTGACCTCGGTGTATACTCATTTCCTGCCCCAGCACGGCCGCCCAGAGGTCACCACAATGCCCCTGGGTCTGGGAACGACAGTGGATTACATCTTCTTCTCAGCTGAGTCCTGCGAGAATGGGAACAGAACTG-------------------ATCGCAGGCTGTATCAGG--ATGGAACCCTCAAGCTCCTGGGCCGGCTCTCGCTCCTCTCTGAAGAGATCCTCTGGGCTGCCAACGGCTTACCCAACCC--CTTCTGCTCTTCAGA----------------CCAC--CTCTGCCTACTGGCTAGCTT--CGGGATGGAAATCGCGGCCCC---------------ATGA
ATGAGACGTTTGCTGCAGCGCTCAGGTCCTTTCACTGCAGCGCACAGACTCCCTAGTTGTGGCCTGAGTGGTCCAAAGGGCGCGGGTTCAGTAACTGCAGCACGTGGACTTCGTAGCTCCACGGCAGTCAATCAGTCGTGTCCAACTCTTTGCGACCCCATGGACTGCAGCACACCAGGCTTCCCTGTCCATCACCAACTTCCAGAGCCTGCTCAAACTCAAGTCCATCGAGTCAGTGATGCCATCCCACCATCTCATCCTCTGTCATCCCCTTCTCCTCCTGGCTTCAATCTTTCCCAGCATCAGTGTCTTTTCCAAGGCATTTTCAAGAAGAAGAGAGAGGCCAAGGATAAAATCCTAGAAACACCCTCATTGAAAGGTGGGTGTTCAGAGGAACAGAAAGAAGAGGTGCTAGGACAAAACGCCCACAGAGTTGGAAGGAAAGTAAGAATGTTACAGAAACCCGAGGAAAGAGACTCTTGTAGTTATGTGCTAGTTCCTAAGAGGCTTCCCAATCTTCAGGTTACCCCAGTGGCCTCTGCTTCATCCCTGGAGAAGCCTTGCAGTGTTATGATGTACCCAGCTGGTCCTGGATCTGGTGACTCTCCAGAGGGCTTCCCGGAACAACCAAGGCGCCAGTCCCAAACCGGAAATCGCACGGCTCAGACACTGGATGCTTTCTTCACATGTCGGAAAAATGTCCTCCTGGCGAAGAGCTCGTCCTCCCAGGTAGAAGGCAACTTTGCCATGGCCCCTCGGGGCCCCGACCAGGAGGAGTGTGAGGGCCTGCTGCAGCAGTGGAGGGAAGAAGGGTCGAGCCAGGTGCTGTCAACTGTGAGCGACGGTCCCCTTGTAGATAAGGGACTCGCCGAGAGCAGCCTGGCCCTCCTGATGGATAATCCCGGAGAACAGGATGCTGCTCCGGAGGACACGTGGTCCAGCAGGCAGCTGAGTGACCTGCGGGCAGCGGAGAACCTGGAGGAGCCTTTTCCCGAGGTGCTAGGAGAGGAGGAGCCGCTGCCGGAGGTTGAGGGCCCAATGTGGGCAGCAGTGCCTGTGCAGACCGGCCGCCAGTACACAGATTGTGCCGTCCTCCCTGTGGGTGCGCTGGCCACAGAGCAGTGGGACGAGGACCCCGCGGTGGTGGCCTGGAGCATCGCACCGGAGCCTGTGCCCCAGGAAGAGGCTCCCGTCTGGCCCTTTGAGGGTCTGGGGCAGCTGCAGCCTCCCCCAGTGGAAATCCCGTATCATGAAATCTTGTGGCGAGAATGGGAGGATTTCTCCACTCAGCCAGATGTTCAGGGCCTGGAGGCAGGGGATGGCCCTCAGTTCCAGTTCACTCTGATGTCCTATAATATCCTGGCCCAGGACCTAATGCAGCAGAGCTCCGAGCTCTATCTGCATTGCCACCCAGACATCCTGAACTGGAGCTATCGCTTTGCGAATCTCATGCAGGAATTCCAGCACTGGGACCCGGACATCTTGTGTCTCCAGGAAGTCCAGGAAGATCATTACTGGGAGCAGCTGGAGCCCTCTCTGAGAATGATGGGCTTTACCTGTTTCTACAAGAGGAGGACCGGGTGTAAGACAGATGGCTGTGCTGTCTGCTACAAGCCCACGAGATTCCGTCTGCTCTGCGCCAGCCCCGTGGAGTACTTTCGGCCTGGCTTGGAGCTCCTCAATCGGGACAACGTGGGCTTAGTGTTGCTGCTGCAGCCACTGGTCCCAGAAGGCCTGGGGCAAGTCTCGGTGGCCCCCTTATGTGTGGCAAATACCCACGTCCTGTACAACCCACGGCGGGGCGACGTCAAGCTGGCCCAGATGGCCATTCTCCTGGCTGAAGTGGACAAGGTGGCCAGGCTGTCAGATGGCAGCCACTGCCCCATCGTCCTGTGTGGGGACCTGAACTCTGTCCCCGACTCGCCTCTCTACAACTTCATCAGGGACGGGGAGCTCCAGTATCACGGGATGCCAGCCTGGAAGGTATCTGGACAGGAAGACTTCTCCCATCAGCTTTATCAGAGGAAGCTGCAGGCCCCACTGTGGCCCAGCTCCCTGGGTATCACTGACTACTGTCAGTATGTCACCTCCTGTCACCCCACGAGCTCAGAGAGACGCAAGTATAGCCGAGACTTCCTGCTGCGTTTCCGCTTCTGCAGCATGGCCTGCCGGCGACCTGTGGGACTGGTTCTTCTGGAAGGAGTGACAGACACTAAGCCAGAGCGACCTGCTGGCTGGGCTGAGTCTGTCATTGAGGAAGATACATCTGAGTCTGAGCCGGATGTCCCCAGGACTGCAGGCACCATCCAGCACTGCCTACACCTGACCTCGGTGTATACTCATTTCCTGCCCCAGCACGGCCGCCCAGAGGTCACCACAATGCCCCTGGGTCTGGGAACGACAGTGGATTACATCTTCTTCTCAGCTGAGTCCTGCGAGAATGGGAACAGAACTGGCACGTGCTGCGAGCTTGAAGCAAAGGGAGCAGCAGGAAATGTGGCCATCCACCCAGTAGGCTGGCTTCTGGTCCTCTC----GGGAGCCTGTG--------ACGGC--ACTCAGCTCTGCGTCAGTTCCAAGAAAGTGGCCGTGCGGTATCCACGGTTCTGC--ACTGGCGACCGTGGTGAGAAGGGACTTGAGATCCCCGTTGCTATGGTTTTCTGA
Hi deepakumar, I have the same problem in ka/ks calculator. Do you have any idea how to resolve it ????
Can someone post an example of a pair for which it does work?
I think it might be due to the alignment result, if the two sequences are of too much length difference you will have to trim the unaligned parts.
Moreover, muscle is likely not the best tool to align the sequences for this purpose, as you will need to do a codon-aware alignment (== align the dna sequence codon per codon), otherwise the Kn/Ks estimates will not make any sense!
In addition, can someone really explain how are you running the tool? There are three people with the problem, just one test sequence, but no mention to as how the tool is being used. Is it the command-line version? The windows version? The online version? What parameters (if any) are being used?
@deepkumar and @adeena_hassan I've been observing the same problem. Did you solve it somehow? Thanks
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.Ka/Ks calculator is a command line tool and it takes input in AXT format. A perl script that convert FASTA file into AXT is available with the tool. It reads a pair of sequences and computes corresponding estimates (length of the two sequence must b equal).
Here is my input file in AXT format:
Command:
Not sure if this example makes sense .. it does not even seem to be "aligned" according to me. Also the stretches of Ns might cause some issues
Were you able to make it work in the end? I removed the piece of code that ashatan.314 noted as the problem but I still had the same error. Also other sequences that were not divided by 3 didnt give me any error.
@katerinapargana Ka/Ks calculator only worked for pairs of sequence and it worked well with the input given above. Before calculation, gaps and stop codons between compared sequences will be removed.