SURPI is a pipeline to find out the pathogens from clinical metagenomics samples. It is tested only on Ubuntu and it assumes many things about your installation. I recently installed in on CentOS. These are few key points you need to take care before running the pipeline.
The github page of SURPI is https://github.com/chiulab/surpi and it's published in Genome Research.
- The
create_snap_to_nt.sh
program uses-Ofactor
as 1000, on line 29, which may not work for your machine. You need to figure out the correct value and make necessary changes. Read snap aligner document for details. - The abyss installation requires
mmap
. Make sure you have installed it before compiling abyss. http://hackage.haskell.org/package/mmap-0.5.9/mmap-0.5.9.tar.gz - Make sure
formatdb
is there in your path. It can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/ - The
taxonomy_lookup.pl
program, at line 84 hassort --parallel=$cores
, where you may need to remove--parallel=$cores
option, if the sort utility on you machine does not support--parallel
option. - The
abyss_minimus.sh
program tries to usempirun
to make it parallel. If the mpirun is not configured properly, you need to remove the option 'np=$cores' in line 86, so that it will not be run parallelly. - The
ribo_snap_bac_euk.sh
program is hardcoded to use the 10,75 as arguments tocrop_reads.csh
, which you may need to change in line 43. - The
coveragePlot.py
program usesmlab.load()
at line 47, which is deprecated in latest version of matplotlib. Hence, you may need to change it tonp.loadtxt()
I will update this post as and when I find more issues with the pipeline.
I forgot to mention that the configuration file that gets created, will keep the wrong path for the reference sequences. For example, in your case the path to db is dbname=/reference/taxonomy/gi_taxid_nucl.db
but I am sure it's not /reference/
instead either it should be reference/
or <full path>/reference/...
Change all the paths carefully in the configuration file.