Question

Segmentation fault for cd-hit

0

Entering edit mode

9.7 years ago

Crystal ▴ 70

Hello All,

I tried to compare two database and remove redundant sequences in one database by using cd-hits.

I do found someone in this forum had the same problem, but I didn't see the resolution on it.

The command I used is

tools/cd-hit-v4.6.1-2012-08-27/cd-hit-est-2d -i V3_VFs.fas -i2 R1.ffn -G 0 -c 1.0 -AS 0 -AL 0 -aL 1.0 -aS 1.0 -o res2_R1_cdhitG

The output is like:

================================================================
                            Output                              
----------------------------------------------------------------
total seq in db1: 5927
total seq in db2: 2457
longest and shortest : 31497 and 78
Total letters: 7764226
Sequences have been sorted
longest and shortest : 16680 and 99
Total letters: 2818134
Approximated minimal memory consumption:
Sequence        : 10M
Buffer          : 1 X 15M = 15M
Table           : 1 X 16M = 16M
Miscellaneous   : 4M
Total           : 47M
Table limit with the given memory limit:
Max number of representatives: 0
Max number of word counting entries: 94035159
Segmentation fault

My colleague used the same code on her Mac to compare another two databases, and the code worked.

So how I solve my problem?

Thanks
Crystal

software-error • 2.5k views

ADD COMMENT • link updated 9 months ago by GenoMax 148k • written 9.7 years ago by Crystal ▴ 70

GenoMax · Answer 1 · 2024-03-09

I have tested the command that you supplied. It works well. My PC computer is not a MAC. Maybe you can update your cd-hit-est software and try again.

cd-hit-est-2d -i Galaxy117_transcripts.fasta -i2 Galaxy349_trinity_transcripts.fasta -c 1.0 -AS 0 -AL 0 -aL 1.0 -aS 1.0 -o output_cdhit
================================================================
Program: CD-HIT, V4.8.1 (+OpenMP), Jul 25 2023, 19:20:28
Command: cd-hit-est-2d -i Galaxy117_transcripts.fasta -i2
         Galaxy349_trinity_transcripts.fasta -c 1.0 -AS 0 -AL 0
         -aL 1.0 -aS 1.0 -o output_cdhit

Started: Sun Mar 10 09:02:55 2024
================================================================
                            Output                              
----------------------------------------------------------------
total seq in db1: 98196
total seq in db2: 48332
longest and shortest : 32042 and 140
Total letters: 96639203
Sequences have been sorted
longest and shortest : 29794 and 299
Total letters: 139620363

Approximated minimal memory consumption:
Sequence        : 244M
Buffer          : 1 X 19M = 19M
Table           : 1 X 18M = 18M
Miscellaneous   : 4M
Total           : 287M

Table limit with the given memory limit:
Max number of representatives: 40000
Max number of word counting entries: 64073727

..........    10000  finished
..........        0  compared          0  clusters
........    20000  finished
..........    30000  finished
..........    40000  finished
..........    50000  finished
..........    12906  compared          3  clusters
........    60000  finished
..........    70000  finished
..........    80000  finished
..........    90000  finished
..........    52906  compared         52  clusters
..........    92906  compared         52  clusters

48332 compared  52 clustered
writing non-redundant sequences from db2
writing clustering information
program completed !