Entering edit mode
9.5 years ago
mafireyi
▴
80
I am trying to use Allpaths for denovo assembly.
My data summary looks like the following.
Hiseq_Run12_17122014 25GB Mate-pair (Size selected to 3KB)
Hiscan_Run20_12022012 15GB Paired-end (Nextera V1) 180bp insert
Hiscan_Run19_17102012 17GB PE (Nextera V2) 500bp insert
Hiscan_Run15_12042012 6,5GB Paired-end (Nextera V1) 180bp insert
Hiscan_Run14_22032012 3,94GB Paired-end (Nextera V1) 380bp insert
Hiscan_Run12_01032012 3,75GB Paired-end (Nextera V1) 380bp insert
Hiscan_Run5_08092011 3,58GB Single-end(Nextera V1) 380bp insert
Hiscan_Run4re_26072011 1,3GB Single-end(Nextera V1) 180bp insert
Hiseq_Run14_150313 XXGb Paired end 250bp insert size
I used Hiseq14_150313 as P E reads as fragment and Hiseq_Run12_17122014
matepairs as the jumping reads for my csv files. I keep getting the following error when I run PrepareAllPaths.pl
Here's my PBS script:
#!/bin/bash
#PBS -N PrepareAllpaths
#PBS -q batch
#PBS -l nodes=1:ppn=16
cd $PBS_O_WORKDIR
mkdir -p NewGuava/data
#export PATH:/scratch/sysusers/godwin/allpaths-bin/bin:$PATH
/scratch/sysusers/godwin/allpaths-bin/bin/PrepareAllPathsInputs.pl DATA_DIR=$PBS_O_WORKDIR/NewGuava/data PLOIDY=2 IN_GROUPS_CSV=in_groups.csv IN_LIBS_CSV=in_libs.csv OVERWRITE=True
exit 0
The error I see:
Call to new failed, memory usage before call = 17169108k.
AND
**** 2015-06-29 13:10:03 (CG): ConvertToFastbQualb.pl failed for group 'paired_ends'.
---- 2015-06-29 13:10:04 (CG): Importing group 'mate_ends'.
Please assist. What may be the problem
It would help if you gave us the exact command you used.
Try adding a memory usage PBS directive explicitly to the PBS header.
Thanks I have tried that. Will see the results tomorow
Wow, preparing datasets for ALLPATHS shouldn't take that long (unless you have tons of data). Also, ALLPATHS performs way faster on intel than on the AMD processors, FYI (we are talking 20 hrs vs. 120 hrs here).
I have abt 40G frag lib and 25G mate pair lib. Is that considered tonnes of data. Have a 100x coverage.
I had total of 86Gb (compressed data 36G pe + 50Gb mp), with little over 35X coverage. For preparing dataset, it used 127 mins wall time (32 CPUs, 512GB memory requested). Where as for actual assembly, it needed 565Gb RAM, 32 procs and ran for 6166.73 mins (both steps on AMD machine). It was a different story with Intel machine!
Intersecting fact to know. Thanks
Try adding
ulimit -s unlimited
to your PBS script. I know ALLPATHS team recommends it, but don't know what it does :)Also, make sure the fastq files have fq or fastq extension (gzipped or uncompressed). No spaces after last
,
in both of the csv files, and space for the empty field eg:2000bp, trialrun, genspp, jump, 1, , , 2000, 500, outward, ,
Oh. Saw your response late. My fastq files have fastq.gz extensions. Will it fail again?
Oh just realised you said gzipped or uncompressed. Thot that was gunzipped.
Sorry for the confusion. I meant compressed or uncompressed (fastq.gz or fastq)! I normally put like this:
103, 2000bp, /home/path/to/fastqfiles/2000bp/some_saple_number_R?.fastq.gz