Error during running CD-HIT
1
0
Entering edit mode
3.0 years ago

Hi, I am running CD-HIT for very large files.

Warning: Some seqs are too long, please rebuild the program with make parameter MAX_SEQ=new-maximum-length (e.g. make MAX_SEQ=10000000) Not fatal, but may affect results.

I tried to define the MAX_SEQ=10000000, but I could not.

I am using bash.

Could you please help me?

CD-HIT • 3.4k views
ADD COMMENT
0
Entering edit mode

recompile the program using

make -B MAX_SEQ=10000000
ADD REPLY
0
Entering edit mode

Thanks for help, but it did not work.

ADD REPLY
0
Entering edit mode

Thanks! when I run make MAX_SEQ=10000000, I got

"In file included from cdhit-common.c++:28:0: cdhit-common.h:39:9: fatal error: zlib.h: No such file or directory #include<zlib.h>"

Could you please help me!

ADD REPLY
0
1
Entering edit mode
3.0 years ago
Mensur Dlakic ★ 28k

MAX_SEQ needs to be edited in file cdhit-common.h which is part of the CD-HIT distribution. In the file I have, it is set to:

#define MAX_SEQ 655360

Simply put a larger number there and recompile (just make should do the trick).

ADD COMMENT
0
Entering edit mode

Hi,

I ran "git clone https://github.com/weizhongli/cdhit.git"

Then, changed the directory to cdhit.

Then ran "make MAX_SEQ=10000000"

I got

"g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-common.c++ -c
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit.o cdhit-common.o cdhit-utility.o -lz -o cd-hit
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-est.o cdhit-common.o cdhit-utility.o -lz -o cd-hit-est
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-2d.o cdhit-common.o cdhit-utility.o -lz -o cd-hit-2d
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-est-2d.o cdhit-common.o cdhit-utility.o -lz -o cd-hit-est-2d
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-div.c++ -c
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-div.o cdhit-common.o cdhit-utility.o -lz -o cd-hit-div
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-454.c++ -c
g++  -fopenmp -DWITH_ZLIB -O2 -DMAX_SEQ=10000000  cdhit-454.o cdhit-common.o cdhit-utility.o -lz -o cd-hit-454"

But after running the command line, I got the same error (Some seqs are too long, please rebuild the program with make parameter MAX_SEQ=new-maximum-length (e.g. make MAX_SEQ=10000000).

Also I edited the file manually, but also still got the same error.

ADD REPLY
0
Entering edit mode

It is not an error, it is a warning. That may seem like nitpicking, but warnings are typically less severe and will let the program finish.

There are two explanations: 1) you have a sequence in your data longer than 10 million, so the MAX_SEQ number needs to be even larger. 2) you are running the conda-installed version rather than the one you just compiled. Typing which cd-hit should tell you the program location as the system sees it. If it points to conda directories rather than what you have just compiled, that would explain the warning.

To be sure, you can run the command with full path. Assuming that after compilation your program is in /home/mostafa/cd-hit/bin, you would run it as:

/home/mostafa/cd-hit/bin/cd-hit .... [ remaining parameters ]
ADD REPLY
0
Entering edit mode

It worked.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6