Question

Error Muscle alignment (msa package, through Rscript)

1

Entering edit mode

4.9 years ago

alnam ▴ 10

Hi,

I have encountered a problem, while running Muscle through the MSA package in R (via Rscript). I get this:

*** ERROR ***  MSA::SetIdCount: cannot increase count
Fatal error, exception caught.
Error in msaFun(inputSeqs = inputSeqs, cluster = cluster, gapOpening = gapOpening,  : 
  MUSCLE finished by an unknown reason

And I couldn't find what it meant. Is it that I have too many sequences..? If anyone has an idea of how to solve this, it would be great.

Thanks!

msa r muscle • 5.0k views

ADD COMMENT • link updated 7 months ago by ulrich.bodenhofer ▴ 10 • written 4.9 years ago by alnam ▴ 10

0

Entering edit mode

What is your input data like? You haven't told us how many sequences you have so we can't tell you if its too many.

Have you tried running it on a subset of your data?

It would also be worth trying to run the data through MUSCLE directly, without the R wrapper.

ADD REPLY • link 4.9 years ago by Joe 22k

0

Entering edit mode

Thanks for your answer! I am running it on 10,000-20,000 sequences of around 1,500bp, 96 times. When running individually a case that caused a problem in the loop, there was no issue. I think the issue was what is described in the section "Known issues" of the msa package (here: https://bioconductor.org/packages/release/bioc/vignettes/msa/inst/doc/msa.pdf ), which is that there can be memory leaks with Muscle? Anyway, I used ClustalOmega instead and it worked very well.

Thanks again!

ADD REPLY • link 4.9 years ago by alnam ▴ 10

0

Entering edit mode

Are your sequence headers unique? Are you using a profile? Could you provide the command that you're using?

This is the part of the code that gives you the error:

void MSA::SetIdCount(unsigned uIdCount)
        {
        if (m_uIdCount > 0)
            {
            if (uIdCount > m_uIdCount)
                Quit("MSA::SetIdCount: cannot increase count");
            return;
            }
        m_uIdCount = uIdCount;
        }

If you search MSA::SetIdCount in the link below, you can find the codes where this function is used, and it might give you some ideas of the reason behind the error:

https://git.wur.nl/haars001/reas/-/tree/master/muscle3.6_src

ADD REPLY • link 4.9 years ago by Fatima ▴ 1000

0

Entering edit mode

Hi, thanks for your reply!

Yes I had found this, but was not sure of what it meant (what is m_uIdCount ?). Yes my headers are unique (but I wasn't sure there was a problem if not, so thanks for this) and I don't know if I am using a profile. As I said, I have found a solution, which is using clustalomega instead of muscle. I agree it does not solve the issue but at least it helps.

Thanks again!

ADD REPLY • link 4.9 years ago by alnam ▴ 10

0

Entering edit mode

hi, did you manage to resolve this? I am having the exact error. I created an alignment with 2067 sequences using the "dna = msa(dna, method = "Muscle", order="aligned")", and the input file was a "DNAStringSet". I then moved to do the second alignment which had 174 sequences and it also ran okay. But when I moved to the third alignment with 2644 sequences, I received the error "* ERROR * MSA::SetIdCount: cannot increase count". So on smaller alignments it seems to be running okay and only get the error when i move to bigger datasets

ADD REPLY • link 4.8 years ago by Mac ▴ 20

1

Entering edit mode

I have changed the method and used "dna = msa(dna, method = "ClustalOmega", order="aligned")" instead and it worked okay. I think there is a bug in the "muscle" method

ADD REPLY • link 4.8 years ago by Mac ▴ 20

0

Entering edit mode

Hi (quite late), sorry I had completely forgotten this. I did the same, there seems to be some issue with the muscle method (I haven't checked if it is still the case, though). Thanks for sharing your solution here!

ADD REPLY • link 3.2 years ago by alnam ▴ 10

0

Entering edit mode

I have this problem with 7 sequences.

SEQS
"A DNAStringSet instance of length 7"
names(SEQS):
"1" "2" "3" "4" "5" "6" "7"
any(duplicated(names(SEQS)))
"FALSE"

all sequences have a width of 361 nt.

Those sequences are also not the same:

require(stringdist)
d=stringdistmatrix(a=as.vector(SEQS), b=as.vector(SEQS), method="hamming",useNames ="names")

d[1,]
"1  2  3   4  5  6  7"
"1  0 16  6 22 13 17 12"

As previously noticed, it is working using "method="ClustalOmega" (using Gonnet ??)

ADD REPLY • link 4.8 years ago by gildas.lepennetier ▴ 10

score 1 · Answer 1 · 2024-08-23

Just because of a recent issue raised on the package's GitHub page (https://github.com/UBod/msa/issues/30), I became aware of this problem and tried to solve it. I am sorry for not having seen this one here for more than four years (!!), but I hope the solution still helps in some way.

The point is that there was a major bug in the Muscle interface that - to my deepest regret - had not been discovered by us so far: If you run Muscle more than once, you can only run it with no more sequences than in the first call. The solution was not that trivial, but I tried my best. A new version (1.37.3) has been pushed to the GitHub repo (https://github.com/UBod/msa) and the BioC devel branch now. I hope this closes this issue. If you encounter any further problems, please let me know by reopening this issue on GitHub (https://github.com/UBod/msa/issues/30) or opening a new one.