using DiscoSNP++ gives error concerning $TERM and -T
3
0
Entering edit mode
2.7 years ago

Hi everyone,

I'm trying to run DiscoSNP++ on data from the Tara Oceans Expedition but I ran into a problem that i have trouble solving. The input I'm using are two very large .gz files (study accession PRJEB4352). The output that I need are a .fa file and a .vcf file, but those don't show up. This is the script I used for the job which took about 40 minutes to complete:

#!/usr/bin/env bash

#PBS -l nodes=1:ppn=1
#PBS -l walltime=02:00:00
#PBS -l pmem=8gb
#PBS -A default_project

cd /vsc-hard-mounts/leuven-data/341/vsc34135/Laso-Jadart
source /vsc-hard-mounts/leuven-data/341/vsc34135/miniconda3/etc/profile.d/conda.sh
conda activate discosnp_env

curl -o ERR868369_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR868/ERR868369/ERR868369_1.fastq.gz
curl -o ERR868369_2.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR868/ERR868369/ERR868369_2.fastq.gz
run_discoSnp++.sh -r fof_1.txt –k 51 -b 1 -T

With fof_1.txt being: ERR868369_1.fastq.gz ERR868369_2.fastq.gz

This is part of the error job output This is part of the error job output

And this is part of the job output: And this is part of the job output

I'm quite new to making scripts and working with the VSC so any help is appreciated.

DiscoSNP VSC bash • 1.2k views
ADD COMMENT
0
Entering edit mode
2.7 years ago

Hi there.

The tput error is due to the fact that the TERM variable is not set in your case (which is the terminal to be used). (See stack_overflow_post)

It may be solved setting this variable before to run the discoSnp script.

TERM=xterm
run_discoSnp++.sh -r fof_1.txt –k 51 -b 1 -T

Please let me know if this also solves the graph construction issue.

Also be sure that fof_1.txt contains two lines:

ERR868369_1.fastq.gz
ERR868369_2.fastq.gz

Depending on how you which to consider these two files please read the doc (section Input File of file format)

I hope this helps,

Pierre

ADD COMMENT
0
Entering edit mode

Hi Pierre.

Thanks for the fast response. I've tried out your solution together with moving to a smaller dataset and adding some extra "just to be safe" code. It doesn't seem to work though. I still get the same errors... Here's my tweaked script

#!/usr/bin/env bash

#PBS -l nodes=1:ppn=1
#PBS -l walltime=02:00:00
#PBS -l pmem=4gb
#PBS -A default_project

cd /vsc-hard-mounts/leuven-data/341/vsc34135/Ofunato
source /vsc-hard-mounts/leuven-data/341/vsc34135/miniconda3/etc/profile.d/conda.sh
conda activate discosnp_env

curl -o ofunato1.fastq.bz.1 https://ddbj.nig.ac.jp/public/ddbj_database/dra/fastq/DRA005/DRA005744/DRX084576/DRR090871_1.fastq.bz2
bzcat ofunato1.fastq.bz.1 | gzip -c >ofunato1.1.gz
curl -o ofunato1.fastq.bz.2 https://ddbj.nig.ac.jp/public/ddbj_database/dra/fastq/DRA005/DRA005744/DRX084576/DRR090871_2.fastq.bz2
bzcat ofunato1.fastq.bz.2 | gzip -c >ofunato1.2.gz

TERM=xterm
run_discoSnp++.sh -r ofunato_1.txt -T

with ofunato_1.txt:

ofunato1.1.gz
ofunato1.2.gz

The job takes less than a minute to run, so it must be something in the very beginning that goes wrong.

ADD REPLY
0
Entering edit mode

Hello Paulien

I've no issue with the master and the last released (2.6.2) versions (on an osx machine), using exactly your command lines.

###############################################################
 #################### DISCOSNP++ FINISHED ######################
 ###############################################################
DiscoSnp++ total time in seconds: 277
################################################################################################################
 fasta of predicted variant is "discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa"
 Ghost VCF file (1-based) is "discoRes_k_31_c_3_D_100_P_3_b_0_coherent.vcf"

Can you make a try with the last release https://github.com/GATB/DiscoSnp/releases (you may either install the sources or use any of the precompiled versions mac or Linux)

Depending on your feedback we will update the conda package.

I hope this helps.

ADD REPLY
0
Entering edit mode

Hi Pierre,

I tried installing the latest version via the sources but I get several error messages.

git clone --recursive https://github.com/GATB/DiscoSnp.git

gives me:

fatal: reference is not a tree: 7a2202e751a11a5a7125eb756b6ab2d285fe5f5f
Unable to checkout '7a2202e751a11a5a7125eb756b6ab2d285fe5f5f' in submodule path 'thirdparty/gatb-core'

and then it just stops and goes back to the command line. I continued with

sh INSTALL

Which gives me several errors:

fatal: reference is not a tree: 7a2202e751a11a5a7125eb756b6ab2d285fe5f5f
Unable to checkout '7a2202e751a11a5a7125eb756b6ab2d285fe5f5f' in submodule path 'thirdparty/gatb-core'
INSTALL: line 18: cmake: command not found
make: *** No targets specified and no makefile found.  Stop.
Running simple test...


 Running discoSnp++ 2.3.X, in directory /vsc-hard-mounts/leuven-data/341/vsc34135/Ofunato/DiscoSnp with following parameters:
         read_sets=fof.txt
         prefix=discoRes_k_31_c_3
         c=3
         C=2147483647
         k=31
         b=0
         d=1
         D=100
         s=
         P=3
         p=discoRes
         G=
         e=
         x=
         starting date=Wed Apr 13 12:03:54 CEST 2022


../run_discoSnp++.sh: line 503: /vsc-hard-mounts/leuven-data/341/vsc34135/Ofunato/DiscoSnp/build/bin/read_file_names: No such file or directory
 ############################################################
 #################### GRAPH CREATION  #######################
 ############################################################
/vsc-hard-mounts/leuven-data/341/vsc34135/Ofunato/DiscoSnp/build/ext/gatb-core/bin/dbgh5 -in fof.txt_discoRes_k_31_c_3_D_100_P_3_b_0_removemeplease -out discoRes_k_31_c_3 -kmer-size 31 -abundance-min 3 -abundance-max 2147483647 -solidity-kind one -verbose 1 -skip-bcalm -skip-bglue -no-mphf -histo-max 1000000
../run_discoSnp++.sh: line 521: /vsc-hard-mounts/leuven-data/341/vsc34135/Ofunato/DiscoSnp/build/ext/gatb-core/bin/dbgh5: No such file or directory
 there was a problem with graph construction$ reset
diff: discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa: No such file or directory
*** Test: FAILURE on diff .fa

I honestly don't really know how to install DiscoSNP using a precompiled version.

PS: is it right that I don't get the latest version using conda? I get version 1.5.3.

ADD REPLY
0
Entering edit mode
2.6 years ago

fatal: reference is not a tree: 7a2202e751a11a5a7125eb756b6ab2d285fe5f5f

This is due to your git version.

  • With git 1.8.3.1 I have the error
  • With git 2.19.1 this is fine.

About the rest of your errors:

  • Things could not have compiled as all modules were not downloaded
  • INSTALL: line 18: cmake: command not found:

    • You need cmake to compile this discoSnp (as most of the tools).
ADD COMMENT
0
Entering edit mode
2.6 years ago

Hello Pierre,

I managed to run DiscoSnp++ now! It had to do with my conda environment that wasn't properly set. My apologies for the hassle anyway.

I noticed that you cowrote metaVaR, so I figured I could ask you my question(s) about that matter too. So the preprocessing step doesn't seem to work. I uploaded the perl script of metavarFilter.pl and ran it on my new .vcf file with the -a option set to 5. It returns empty .txt files so something get wrong. The only message I get is

Use of uninitialized value $d[3] in pattern match (m//) at metavarFilter.pl line 47, <IN> line x.

x going from 1-20. So it seems like the d[3] value needs to be defined/initialized but I don't know what it refers to.

Thanks in advance for all the help! Paulien

ADD COMMENT

Login before adding your answer.

Traffic: 1137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6