How to use cd-hit-para.pl in SGE?
0
0
Entering edit mode
2.4 years ago
FelipeMSD • 0

Dear all,

I have a fasta file with more than 200MM protein sequences that I would like to cluster in a non-redundant catalogue (100% identity) using cd-hit, but as this file is so big I thought using cd-hit-para.pl could be a good option to optimize it. At my institution we use SGE and I was trying to run this qsub script (below) to send the job to a queue, but with not success (Error message: no host at /bin/cd-hit-para.pl line 97). I followed the user guide (http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf) but think I didn't understand well on how to use it. Do you have an example on how to run cd-hit-para.pl in SGE or tell me if there is a better way to use cd-hit for a large file like that?

Script:

#!/bin/bash
#$ -N cdhit
#$ -o /output/logs/$JOB_NAME_$JOB_ID.out
#$ -e  /output/error/$JOB_NAME_$JOB_ID.err
#$ -l virtual_free=20G,h_vmem=20G,h_rt=6:00:00
#$ -q long-sl7
#$ -pe smp 8

cd-hit-para.pl -i file.faa -o file_100.faa -c 1.0 -M 20000 -T $NSLOTS --T "SGE"-Q 20

Command line:

$ qsub cdhit
SGE cd-hit-para.pl • 518 views
ADD COMMENT

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6