masurca invalid forward file for PE library
1
0
Entering edit mode
10 months ago
m.t.lorenc • 0

Hi, I ran into the following problem:

> singularity run  --bind /scratch/masurca/:/data /pkg/containers/masurca_4.1.0.sif masurca /data/NRotund.cfg 
Error line 17 of configuration file '/data/NRotund.cfg':
invalid forward file for PE library 'pe': '/data/V350181330_L03_R1.fastp.fq.gz' Bad file descriptor

> singularity run  --bind /scratch/masurca/:/data /pkg/containers/masurca_4.1.0.sif masurca /data/NRotund.cfg 
Error line 17 of configuration file '/data/NRotund.cfg':
invalid forward file for PE library 'pe': '/data/V350181330_L03_R1.fastp.fq' Bad file descriptor

> singularity run  --bind /scratch/masurca/:/data /pkg/containers/masurca_4.1.0.sif masurca /data/NRotund.cfg 
Error line 17 of configuration file '/data/NRotund.cfg':
invalid forward file for PE library 'pe': '/data/V350181330_L03_R1.fastp.fastq' Bad file descriptor
> singularity run  --bind /scratch/masurca/:/data /pkg/containers/masurca_4.1.0.sif ls /data
NRotund.cfg  V350181330_L03_R1.fastp.fq.gz  V350181330_L03_R2.fastp.fq.gz  V350181330_L04_R1.fastp.fq.gz  V350181330_L04_R2.fastp.fq.gz  allPacBio.fasta  masurca_pbs.sh

```

My config file looks like this:

# example configuration file 

# DATA is specified as type {PE,JUMP,OTHER,PACBIO} and 5 fields:
# 1)two_letter_prefix 2)mean 3)stdev 4)fastq(.gz)_fwd_reads
# 5)fastq(.gz)_rev_reads. The PE reads are always assumed to be
# innies, i.e. --->.<---, and JUMP are assumed to be outties
# <---.--->. If there are any jump libraries that are innies, such as
# longjump, specify them as JUMP and specify NEGATIVE mean. Reverse reads
# are optional for PE libraries and mandatory for JUMP libraries. Any
# OTHER sequence data (454, Sanger, Ion torrent, etc) must be first
# converted into Celera Assembler compatible .frg files (see
# http://wgs-assembler.sourceforge.com)
DATA
#Illumina paired end reads supplied as <two-character prefix> <fragment mean> <fragment stdev> <forward_reads> <reverse_reads>
#if single-end, do not specify <reverse_reads>
#MUST HAVE Illumina paired end reads to use MaSuRCA
PE= pe 380 87  /data/V350181330_L03_R1.fastp.fastq  /data/V350181330_L03_R2.fastp.fastq
PE= pe 382 87  /data/V350181330_L04_R1.fastp.fq.gz  /data/V350181330_L04_R2.fastp.fq.gz
#Illumina mate pair reads supplied as <two-character prefix> <fragment mean> <fragment stdev> <forward_reads> <reverse_reads>
#JUMP= sh 3600 200  /FULL_PATH/short_1.fastq  /FULL_PATH/short_2.fastq
#pacbio OR nanopore reads must be in a single fasta or fastq file with absolute path, can be gzipped
#if you have both types of reads supply them both as NANOPORE type
PACBIO=/data/allPacBio.fasta
#NANOPORE=/FULL_PATH/nanopore.fa
#Other reads (Sanger, 454, etc) one frg file, concatenate your frg files into one if you have many
#OTHER=/FULL_PATH/file.frg
#synteny-assisted assembly, concatenate all reference genomes into one reference.fa; works for Illumina-only data
#REFERENCE=/FULL_PATH/nanopore.fa
END

PARAMETERS
#PLEASE READ all comments to essential parameters below, and set the parameters according to your project
#set this to 1 if your Illumina jumping library reads are shorter than 100bp
EXTEND_JUMP_READS=0
#this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
GRAPH_KMER_SIZE = auto
#set this to 1 for all Illumina-only assemblies
#set this to 0 if you have more than 15x coverage by long reads (Pacbio or Nanopore) or any other long reads/mate pairs (Illumina MP, Sanger, 454, etc)
USE_LINKING_MATES = 0
#specifies whether to run the assembly on the grid
USE_GRID=0
#specifies grid engine to use SGE or SLURM
GRID_ENGINE=SGE
#specifies queue (for SGE) or partition (for SLURM) to use when running on the grid MANDATORY
GRID_QUEUE=all.q
#batch size in the amount of long read sequence for each batch on the grid
GRID_BATCH_SIZE=500000000
#use at most this much coverage by the longest Pacbio or Nanopore reads, discard the rest of the reads
#can increase this to 30 or 35 if your reads are short (N50<7000bp)
LHE_COVERAGE=25
#set to 0 (default) to do two passes of mega-reads for slower, but higher quality assembly, otherwise set to 1
MEGA_READS_ONE_PASS=0
#this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms 
LIMIT_JUMP_COVERAGE = 300
#these are the additional parameters to Celera Assembler.  do not worry about performance, number or processors or batch sizes -- these are computed automatically. 
#CABOG ASSEMBLY ONLY: set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
CA_PARAMETERS =  cgwErrorRate=0.15
#CABOG ASSEMBLY ONLY: whether to attempt to close gaps in scaffolds with Illumina  or long read data
CLOSE_GAPS=1
#number of cpus to use, set this to the number of CPUs/threads per node you will be using
NUM_THREADS = 125
#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*20
JF_SIZE = 52000000000
#ILLUMINA ONLY. Set this to 1 to use SOAPdenovo contigging/scaffolding module.  
#Assembly will be worse but will run faster. Useful for very large (>=8Gbp) genomes from Illumina-only data
SOAP_ASSEMBLY=0
#If you are doing Hybrid Illumina paired end + Nanopore/PacBio assembly ONLY (no Illumina mate pairs or OTHER frg files).  
#Set this to 1 to use Flye assembler for final assembly of corrected mega-reads.  
#A lot faster than CABOG, AND QUALITY IS THE SAME OR BETTER. 
#Works well even when MEGA_READS_ONE_PASS is set to 1.  
#DO NOT use if you have less than 15x coverage by long reads.
FLYE_ASSEMBLY=1
END

What did I miss?

masurca assembly illumina • 534 views
ADD COMMENT
2
Entering edit mode
10 months ago

My guess is that V350181330_L03_R1.fastp.fq.gz is a symbolic link within /data that points to somewhere else on the file system outside what Singularity can see. What does singularity run --bind /scratch/masurca/:/data /pkg/containers/masurca_4.1.0.sif file /data/V350181330_L03_R1.fastp.fq.gz say? It might say something like broken symbolic link to some_location

ADD COMMENT
0
Entering edit mode
singularity run  --bind masurca_4.1.0.sif file /data/V350181330_L03_R1.fastp.fq.gz
FATAL:   "file": executable file not found in $PATH

However, all files were links. Thank you, it works without links.

ADD REPLY

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6