Error in Calculation of Ka/Ks value using Ka/Ks calculator
3
4
Entering edit mode
8.7 years ago

Hi, I am using Ka/Ks calculator software for identification of rapidly evolving genes. I align my gene sequences with muscle software and then after converting to axt format used this tool. I am surprised that it work well with some alignment and for others this give me an error "Error. The size of two sequences in 'ID is not equal." I am not able to understand this error. So if anybody has the idea please provide the help.

Thanks

Deepak

Input file having Error

>ENSBTAT00000025915.4 ensembl:known_by_projection chromosome:UMD3.1:10:89053688:89074011:-1 gene:ENSBTAG00000019454.4 gene_biotype:protein_coding transcript_biotype:protein_coding
ATGA-----TTGC-------------------------------------------------------------------------------------------------------------------------GTCGTGT----------------------------------------------------------------------------------------------------------------------CTGTGTTA---CCTGCTGCTGCCGGCCGC-----------------GCGCCTTTTCC-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GCGCCCTCT----------------------------------------------------------------------------------------------------------------------------------CAGATGCTTTCTTCACATGTCGGAAAAATGTCCTCCTGGCGAAGAGCTCGTCCTCCCAGGTAGAAGGCAACTTTGCCATGGCCCCTCGGGGCCCCGACCAGGAGGAGTGTGAGGGCCTGCTGCAGCAGTGGAGGGAAGAAGGGTCGAGCCAGGTGCTGTCAACTGTGAGCGACGGTCCCCTTGTAGATAAGGGACTCGCCGAGAGCAGCCTGGCCCTCCTGATGGATAATCCCGGAGAACAGGATGCTGCTCCGGAGGACACGTGGTCCAGCAGGCAGCTGAGTGACCTGCGGGCAGCGGAGAACCTGGAGGAGCCTTTTCCCGAGGTGCTAGGAGAGGAGGAGCCGCTGCCGGAGGTCGAGGGCCCAATGTGGGCAGCAGTGCCTGTGCAGACCGGCCGCCAGTACACAGATTGTGCTGTCCTCCCTATGGGTGCGCTGGCCACAGAGCAGTGGGACGAGGACCCCGCGGTGGTGGCCTGGAGCATCGCACCGGAGCCTGTGCCCCAGGAAGAGGCTCCCGTCTGGCCCTTTGAGGGTCTGGGGCAGCTGCAGCCTCCCCCAGTGGAAATCCCGTATCACGAAATCTTGTGGCGAGAATGGGAGGATTTCTCCACTCAGCCAGATGTTCAGGGCCTGGAGGCAGGGGATGGCCCTCAGTTCCAGTTCACTCTGATGTCCTATAATATCCTGGCCCAGGACCTAATGCAGCAGAGCTCCGAGCTCTATCTGCATTGCCACCCAGACATCCTGAACTGGAGCTATCGCTTTGCGAATCTCATGCAGGAATTCCAGCACTGGGACCCGGACATCTTGTGTCTCCAGGAAGTCCAGGAAGATCATTACTGGGAGCAGCTGGAGCCCTCTCTGAGAATGATGGGCTTTACCTGTTTCTACAAGAGGAGGACCGGGTGTAAGACAGATGGCTGTGCTGTCTGCTACAAGCCCACGAGATTCCGTCTGCTCTGCGCCAGCCCCGTGGAGTACTTTCGGCCTGGCTTGGAGCTCCTCAATCGGGACAACGTGGGCTTAGTGTTGCTGCTGCAGCCACTGGTCCCAGAAGGCCTGGGGCAAGTCTCGGTGGCCCCCTTATGTGTGGCAAATACCCACGTCCTGTACAACCCACGGCGGGGCGACGTCAAGCTGGCCCAGATGGCCATTCTCCTGGCTGAAGTGGACAAGGTGGCCAGGCTGTCAGATGGCAGCCACTGCCCCATTGTCCTGTGTGGGGACCTGAACTCTGTCCCCGACTCGCCTCTCTACAACTTCATCAGGGACGGGGAGCTCCAGTACCACGGGATGCCAGCCTGGAAGGTATCTGGACAGGAAGACTTCTCCCATCAGCTTTATCAGAGGAAGCTGCAGGCCCCACTGTGGCCCAGCTCCCTGGGTATCACTGACTACTGTCAGTATGTCACCTCCTGTCACCCCACGAGCTCAGAGAGACGCAAGTATAGCCGAGACTTCCTGCTGCGTTTCCGCTTCTGCAGCATGGCCTGCCGGCGACCTGTGGGACTGGTTCTTCTGGAAGGAGTGACAGACACTAAGCCAGAGCGACCTGCTGGCTGGGCTGAGTCTGTCATTGAGGAAGATACATCTGAGTCTGAGCCGGATGTCCCCAGGACTGCAGGCACCATCCAGCACTGCCTACACCTGACCTCGGTGTATACTCATTTCCTGCCCCAGCACGGCCGCCCAGAGGTCACCACAATGCCCCTGGGTCTGGGAACGACAGTGGATTACATCTTCTTCTCAGCTGAGTCCTGCGAGAATGGGAACAGAACTG-------------------ATCGCAGGCTGTATCAGG--ATGGAACCCTCAAGCTCCTGGGCCGGCTCTCGCTCCTCTCTGAAGAGATCCTCTGGGCTGCCAACGGCTTACCCAACCC--CTTCTGCTCTTCAGA----------------CCAC--CTCTGCCTACTGGCTAGCTT--CGGGATGGAAATCGCGGCCCC---------------ATGA
ATGAGACGTTTGCTGCAGCGCTCAGGTCCTTTCACTGCAGCGCACAGACTCCCTAGTTGTGGCCTGAGTGGTCCAAAGGGCGCGGGTTCAGTAACTGCAGCACGTGGACTTCGTAGCTCCACGGCAGTCAATCAGTCGTGTCCAACTCTTTGCGACCCCATGGACTGCAGCACACCAGGCTTCCCTGTCCATCACCAACTTCCAGAGCCTGCTCAAACTCAAGTCCATCGAGTCAGTGATGCCATCCCACCATCTCATCCTCTGTCATCCCCTTCTCCTCCTGGCTTCAATCTTTCCCAGCATCAGTGTCTTTTCCAAGGCATTTTCAAGAAGAAGAGAGAGGCCAAGGATAAAATCCTAGAAACACCCTCATTGAAAGGTGGGTGTTCAGAGGAACAGAAAGAAGAGGTGCTAGGACAAAACGCCCACAGAGTTGGAAGGAAAGTAAGAATGTTACAGAAACCCGAGGAAAGAGACTCTTGTAGTTATGTGCTAGTTCCTAAGAGGCTTCCCAATCTTCAGGTTACCCCAGTGGCCTCTGCTTCATCCCTGGAGAAGCCTTGCAGTGTTATGATGTACCCAGCTGGTCCTGGATCTGGTGACTCTCCAGAGGGCTTCCCGGAACAACCAAGGCGCCAGTCCCAAACCGGAAATCGCACGGCTCAGACACTGGATGCTTTCTTCACATGTCGGAAAAATGTCCTCCTGGCGAAGAGCTCGTCCTCCCAGGTAGAAGGCAACTTTGCCATGGCCCCTCGGGGCCCCGACCAGGAGGAGTGTGAGGGCCTGCTGCAGCAGTGGAGGGAAGAAGGGTCGAGCCAGGTGCTGTCAACTGTGAGCGACGGTCCCCTTGTAGATAAGGGACTCGCCGAGAGCAGCCTGGCCCTCCTGATGGATAATCCCGGAGAACAGGATGCTGCTCCGGAGGACACGTGGTCCAGCAGGCAGCTGAGTGACCTGCGGGCAGCGGAGAACCTGGAGGAGCCTTTTCCCGAGGTGCTAGGAGAGGAGGAGCCGCTGCCGGAGGTTGAGGGCCCAATGTGGGCAGCAGTGCCTGTGCAGACCGGCCGCCAGTACACAGATTGTGCCGTCCTCCCTGTGGGTGCGCTGGCCACAGAGCAGTGGGACGAGGACCCCGCGGTGGTGGCCTGGAGCATCGCACCGGAGCCTGTGCCCCAGGAAGAGGCTCCCGTCTGGCCCTTTGAGGGTCTGGGGCAGCTGCAGCCTCCCCCAGTGGAAATCCCGTATCATGAAATCTTGTGGCGAGAATGGGAGGATTTCTCCACTCAGCCAGATGTTCAGGGCCTGGAGGCAGGGGATGGCCCTCAGTTCCAGTTCACTCTGATGTCCTATAATATCCTGGCCCAGGACCTAATGCAGCAGAGCTCCGAGCTCTATCTGCATTGCCACCCAGACATCCTGAACTGGAGCTATCGCTTTGCGAATCTCATGCAGGAATTCCAGCACTGGGACCCGGACATCTTGTGTCTCCAGGAAGTCCAGGAAGATCATTACTGGGAGCAGCTGGAGCCCTCTCTGAGAATGATGGGCTTTACCTGTTTCTACAAGAGGAGGACCGGGTGTAAGACAGATGGCTGTGCTGTCTGCTACAAGCCCACGAGATTCCGTCTGCTCTGCGCCAGCCCCGTGGAGTACTTTCGGCCTGGCTTGGAGCTCCTCAATCGGGACAACGTGGGCTTAGTGTTGCTGCTGCAGCCACTGGTCCCAGAAGGCCTGGGGCAAGTCTCGGTGGCCCCCTTATGTGTGGCAAATACCCACGTCCTGTACAACCCACGGCGGGGCGACGTCAAGCTGGCCCAGATGGCCATTCTCCTGGCTGAAGTGGACAAGGTGGCCAGGCTGTCAGATGGCAGCCACTGCCCCATCGTCCTGTGTGGGGACCTGAACTCTGTCCCCGACTCGCCTCTCTACAACTTCATCAGGGACGGGGAGCTCCAGTATCACGGGATGCCAGCCTGGAAGGTATCTGGACAGGAAGACTTCTCCCATCAGCTTTATCAGAGGAAGCTGCAGGCCCCACTGTGGCCCAGCTCCCTGGGTATCACTGACTACTGTCAGTATGTCACCTCCTGTCACCCCACGAGCTCAGAGAGACGCAAGTATAGCCGAGACTTCCTGCTGCGTTTCCGCTTCTGCAGCATGGCCTGCCGGCGACCTGTGGGACTGGTTCTTCTGGAAGGAGTGACAGACACTAAGCCAGAGCGACCTGCTGGCTGGGCTGAGTCTGTCATTGAGGAAGATACATCTGAGTCTGAGCCGGATGTCCCCAGGACTGCAGGCACCATCCAGCACTGCCTACACCTGACCTCGGTGTATACTCATTTCCTGCCCCAGCACGGCCGCCCAGAGGTCACCACAATGCCCCTGGGTCTGGGAACGACAGTGGATTACATCTTCTTCTCAGCTGAGTCCTGCGAGAATGGGAACAGAACTGGCACGTGCTGCGAGCTTGAAGCAAAGGGAGCAGCAGGAAATGTGGCCATCCACCCAGTAGGCTGGCTTCTGGTCCTCTC----GGGAGCCTGTG--------ACGGC--ACTCAGCTCTGCGTCAGTTCCAAGAAAGTGGCCGTGCGGTATCCACGGTTCTGC--ACTGGCGACCGTGGTGAGAAGGGACTTGAGATCCCCGTTGCTATGGTTTTCTGA
software error • 7.6k views
ADD COMMENT
1
Entering edit mode

Hi deepakumar, I have the same problem in ka/ks calculator. Do you have any idea how to resolve it ????

ADD REPLY
1
Entering edit mode

Can someone post an example of a pair for which it does work?

I think it might be due to the alignment result, if the two sequences are of too much length difference you will have to trim the unaligned parts.

Moreover, muscle is likely not the best tool to align the sequences for this purpose, as you will need to do a codon-aware alignment (== align the dna sequence codon per codon), otherwise the Kn/Ks estimates will not make any sense!

ADD REPLY
1
Entering edit mode

In addition, can someone really explain how are you running the tool? There are three people with the problem, just one test sequence, but no mention to as how the tool is being used. Is it the command-line version? The windows version? The online version? What parameters (if any) are being used?

ADD REPLY
0
Entering edit mode

@deepkumar and @adeena_hassan I've been observing the same problem. Did you solve it somehow? Thanks

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY
0
Entering edit mode

Ka/Ks calculator is a command line tool and it takes input in AXT format. A perl script that convert FASTA file into AXT is available with the tool. It reads a pair of sequences and computes corresponding estimates (length of the two sequence must b equal).

Here is my input file in AXT format:

cat Selection_test.axt

Human_gene-Dog_gene
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGAGGTTTGGCTACGGAGGCTGGGTGCAGCTCCAACACCACAGCAAGGGAGAAGCAAAGTTGTTGAAAGATGGGAAAGTCTTCAGAGTGGTGCGAGAGAATTGCTGGGATCCAGAAGTTCGTATTAGAGAAATGGACCAAAAAGGAGTAACAGTGCAAGCCCTTTCCACAGTTCCTGTCATGTTTAGCTACTGGGCCAAACCTGAGGACACTTTAAACCTGTGCCAGCTTTTAAACAACGACCTTGCCAGCACCGTTGTGAGCTACCCCAGGAGGTTCGTGGGTCTGGGGACGTTGCCCATGCAGGCCCCTGAGCTGGCGGTCAAGGAGATGGAGCGCTGTGTGAAAGAGCTGGGCTTTCCCGGGGTCCAAATTGGCACCCACGTCAACGAGTGGGACCTGAACGCGCAGGAGCTCTTTCCTGTCTATGCGGCAGCCGAAAGGCTGAAGTGTTCCCTGTTCGTGCATCCCTGGGACATGCAGATGGATGGACGAATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCATAGCCATTTGCTCCATGATCATGGGTGGAGTATTTGAGAAGTTTCCCAAACTGAAAGTGTGTTTCGCACATGGTGGTGGTGCCTTCCCCTTCACAGTGGGAAGAATCTCCCATGGATTCAGCATGCGCCCAGATCTGTGTGCCCAGGACAACCCCATGAACCCGAAGAAATACCTTGGTTCCTTTTACACAGATGCTTTGGTTCATGATCCTCTGTCCCTCAAGCTGTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACCGATTACCCCTTTCCACTAGGTGAGCTGGAACCTGGGAAACTAATAGAGTCCATGGAAGAATTTGATGAAGAAACAAAGAATAAACTCAAAGCCGGCAATGCCCTGGCATTTTTGGGTCTTGAGAGAAAACAATTTGAATGA
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGCGATTCAGCTATGGAGGCTGGGTGCAGCTTCAACACCACAGCAAGGGAGAAGCAAAAATGTTGAAGGATGGGAAGGTCTTCAGAGTGGTCCAAGAGAACTGCTGGGATCCAGAAGTCCGTATTAGAGAAATGGACCAAACAGGAGTGTCCGTGCAAACCCTTTCCACAGTCCCCCTCATGATTAGCTATTGGGCCAAACCTCAGGACACTTTAGACCTGTGCCAGCTTTTAAACAACGACTTAGCTGCCACTGTTGCGAACCATCCCAGGAGGTTTGTGGGCCTGGGGACATTGCCCATGCAGGCTCCTGAGCTTGCCGTCAAGGAGATGGAGCGCTGTGTGAAGGAGCTGGGCTTTCCCGGGGTCCAGATTGGTTCCCATATCAACGAGTGGGACCTGAATGCACGGGAACTCTTCCCCTTCTACGCATTAGCAGAAAAACTGAACTGTTCGTTATTTGTGCACCCCTGGGACATGCAAATGGATGGACGGATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCACAGCCATTTGTTCCATGATCATGGGAGGAGTGTTTGAGAAATTTCCTAAATTGAAAGTGTGTTTTGCACATGGAGGTGGTGCCTTCCCTTTCACAGTTGGAAGAATCTCCCATGGATTCAACATGCGTCCAGATCTGTGTGCCCAGGACAATCCAATCAACCCAAAGAAATACCTTGGTTCCTTTTACACAGACTCCTTGGTTCATGATCCTCTGGCACTCAAGCTCTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACAGATTACCCCTTTCCACTAGGAGAGCTGAAACCTGGGAAATTGATAGAGTCCATAGAAGAATTTGATGCAGAAACAAAGGATAAACTCAAAGCTGGCAATGCCCTCACATTTTTGGGCCTTGAGAGAAAACAATTCGAATGA

Human_gene-Wolf_gene
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGAGGTTTGGCTACGGAGGCTGGGTGCAGCTCCAACACCACAGCAAGGGAGAAGCAAAGTTGTTGAAAGATGGGAAAGTCTTCAGAGTGGTGCGAGAGAATTGCTGGGATCCAGAAGTTCGTATTAGAGAAATGGACCAAAAAGGAGTAACAGTGCAAGCCCTTTCCACAGTTCCTGTCATGTTTAGCTACTGGGCCAAACCTGAGGACACTTTAAACCTGTGCCAGCTTTTAAACAACGACCTTGCCAGCACCGTTGTGAGCTACCCCAGGAGGTTCGTGGGTCTGGGGACGTTGCCCATGCAGGCCCCTGAGCTGGCGGTCAAGGAGATGGAGCGCTGTGTGAAAGAGCTGGGCTTTCCCGGGGTCCAAATTGGCACCCACGTCAACGAGTGGGACCTGAACGCGCAGGAGCTCTTTCCTGTCTATGCGGCAGCCGAAAGGCTGAAGTGTTCCCTGTTCGTGCATCCCTGGGACATGCAGATGGATGGACGAATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCATAGCCATTTGCTCCATGATCATGGGTGGAGTATTTGAGAAGTTTCCCAAACTGAAAGTGTGTTTCGCACATGGTGGTGGTGCCTTCCCCTTCACAGTGGGAAGAATCTCCCATGGATTCAGCATGCGCCCAGATCTGTGTGCCCAGGACAACCCCATGAACCCGAAGAAATACCTTGGTTCCTTTTACACAGATGCTTTGGTTCATGATCCTCTGTCCCTCAAGCTGTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACCGATTACCCCTTTCCACTAGGTGAGCTGGAACCTGGGAAACTAATAGAGTCCATGGAAGAATTTGATGAAGAAACAAAGAATAAACTCAAAGCCGGCAATGCCCTGGCATTTTTGGGTCTTGAGAGAAAACAATTTGAATGA
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGCGATTCGGCTATGGAGGCTGGGTGCAGCTTCAACACCACAGCAAGGGAGAAGCAAAAATGTTGAAGGATGGGAAGGTCTTCAGAGTGGTCCAAGAGAACTGCTGGGATCCAGAAGTCCGTATTAGAGAAATGGACCAAACAGNNNNNnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGCCAAACCTCAGGACACTTTAGACCTGTGCCAGCTTTTAAACAACGACTTANCTGCCACTGTTGCGAACCATCCCAGGAGGTTTGTGGGCCTGGGGACATTGCCCATGCAGGCTCCTGAGCTTGCCGTCAAGGAGATGGAGCGCTGTGTGAAGGAGCTGGGCTTTCCCGGGGTCCAGATTGGTTCCCATATCAACGAGTGGGACCTGAATGCACGGGAACTCTTCCCCTTCTACGCAGTGGCAGAAAAACTGAACTGTTCGTTATTTGTGCACCCCTGGGACATGCAAATGGATGGACGGATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCACAGCCATTTGTTCCATGATCATGGGAGGAGTGTTTGAGAAATTTCCTAAATTGAAAGTGTGTTTTGCACATGGAGGTGGTGCCTTCCCTTTCACAGTTGGAAGAATCTCCCATGGATTCAACATGCGTCCAGATCTGTGTGCCCAGGACAATCCAATCAACCCAAAGAAATACCTTGGTTCCTTTTACACAGACTCCTTGGTTCATGATCCTCTGGCACTCAAGCTCTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACAGATTACCCCTTTCCACTAGGAGAGCTGAAACCTGGGAAATTGATAGAGTCCATAGAAGAATTTGATGCAGAAACAAAGGATAAACTCAAAGCTGGCAATGCCCNNNNNNNNNNNNNNnnnnnnnnnnnnnnnnnnnnnnnn

Command:

./KaKs_Calculator -i Selection_test.axt -o Selection_output.txt
ADD REPLY
0
Entering edit mode

Not sure if this example makes sense .. it does not even seem to be "aligned" according to me. Also the stretches of Ns might cause some issues

ADD REPLY
0
Entering edit mode

Were you able to make it work in the end? I removed the piece of code that ashatan.314 noted as the problem but I still had the same error. Also other sequences that were not divided by 3 didnt give me any error.

ADD REPLY
0
Entering edit mode

@katerinapargana Ka/Ks calculator only worked for pairs of sequence and it worked well with the input given above. Before calculation, gaps and stop codons between compared sequences will be removed.

ADD REPLY
2
Entering edit mode
5.6 years ago

There are two reasons for this problem:

  • your sequence alignment is based on nucleotides, not proteins, so your codons get pulled apart into gaps, so the length of your sequence is not divisible by 3

  • your gap lengths are not divisible by three.

After fighting with this error, this is the pipeline I settled on:

  1. Align proteins using MUSCLE or T-COFFEE
  2. Convert into nucleotide alignments using pal2nal, with -nogap argument to remove gaps with lengths not divisible by 3 and non-overlapping regions
  3. Convert to AXT using KaKs-Calculator's AXTConverter
  4. Run KaKs_Calculator

Thanks to ashatan.314 for the answer, the error message really hides the much more common problem of %3 != 0

ADD COMMENT
0
Entering edit mode

Hi, Philipp. Thanks for your comments, it's really helpful. I also got stuck with that. I do not have the protein align, my input file is orthology gene family. So pal2nal is not suitable for me. I try to use a perl script:

die "perl $0 <fa> <OUT>" unless ( @ARGV == 2 );
use Bio::AlignIO;
use Bio::SimpleAlign;
$in = Bio::AlignIO->new(
 -file   => "$ARGV[0]",
 -format => 'clustalw'
);
open OUT, ">$ARGV[1]" or die "$!";
while(my $aln = $in->next_aln() ){
# my @id;
# my @seq1;
# $n = 0;
# foreach $seq ($aln->each_seq()) {
#  ($id[$n], $seq1[$n]) = ( $seq->id, $seq->seq);
#     $n++;
# }
# print OUT "$id[0]&$id[1]\n$seq1[0]\n$seq1[1]\n\n";
  $seq1 = $aln->get_seq_by_pos(1);
   ($id1, $sequence1) = ( $seq1->id, $seq1->seq);
    $seq2 = $aln->get_seq_by_pos(2);
     ($id2, $sequence2) = ( $seq2->id, $seq2->seq);
      print OUT "$id1&$id2\n$sequence1\n$sequence2\n\n";
 }
      $in->close();
      close(OUT);;

However, the sequence still not equal. Could you please help to check what the problem was? Thanks.

ADD REPLY
0
Entering edit mode

I edited your comment to make the code look nicer, I don't know perl much, but isn't your input already a clustalw alignment? I don't understand what your input file is, there are many ways to store an orthologous gene family?

ADD REPLY
0
Entering edit mode

Thanks Philipp. My input is orthologous after alignment and in fasta file.

ADD REPLY
0
Entering edit mode

Actually I'm facing the same problem with KAKS, and apart from the AXT converter being buggy and remove gaps from the alignment, I think this piece of code in the kaks.cpp:

try {
    //Check whether (sequence length)/3==0
    if (str1.length()!=str1.length() || str1.length()%3!=0 || str2.length()%3!=0) {
        cout<<endl<<"Error. The size of two sequences in "<<"'"<<name<<"' is not equal."<<endl;
        throw 1;
    }

is also not doing a good job. I'm really sure that the sequences I'm providing KaKs_calculator are properly aligned (gaps are divisible by three and all the sequences have the same length), and yet the program is aborted and I get an error message saying my sequences are not the same length. I have no idea what is causing this behavior, but how I don't know C language well enough to play around with the code to try to understand the reason (maybe a hidden character, white space introduced during the parser in the c code).

Funny thing is: I'm submitting around 6k sequences to KAKS and 70% of the runs worked, the other 30% give me errors due the "size problem". But, they were all processed with the same programs/pipeline. At this point, without any success of contacting the authors, I will just comment this section in KaKs.cpp that checks the length of the sequences (because I'm totally sure they are correct) and run the software without it.

What amazes me the most is the lack of documentation for this tool, despite being widely used and cited. Even in genome papers, I could not get command lines or proper descriptions of how people submitted and processed the sequences from KaKs.

Anyways, this is my experience. Frustrating, if I might say. Best, André

ADD REPLY
0
Entering edit mode

Thanks for sharing André. I thought the size problem is occur because the sequence length is not a multiple of three (May have insertion event when align the sequences). So the ka/ks_calculator can not work well with that pair sequences.It's annoying. Good luck! Best,Yu

ADD REPLY
1
Entering edit mode
6.2 years ago
ashatan.314 ▴ 10

I think I found the answer

There is comment in the source code file KaKs.cpp:

try {
        //Check whether (sequence length)/3==0
        if (str1.length()!=str1.length() || str1.length()%3!=0 || str2.length()%3!=0) {
            cout<<endl<<"Error. The size of two sequences in "<<"'"<<name<<"' is not equal."<<endl;
            throw 1;
        }

So if the length of your seq is not divisible by 3, the programm would throw an error and abort. Also, as it was mentioned above, the output would not make sense if you just have aligned two seqs blindly, with no correspondence to protein seq

ADD COMMENT
1
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode
2.7 years ago
fznajar • 0

It seems that there should be an additional blank line between each pair of sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6