Question

Questions about using the Celera Genome Assembler for HGAP

0

Entering edit mode

9.4 years ago

tptacek3050 ▴ 70

This post is a followup to a previous post: FASTQC and PacBio reads

I am trying to use the PBcR pipeline for the Celera Genome Assembler (v8.3) to perform HGAP for pacbio reads (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR).

I've got the assembler installed, and I was able to successfully assemble the lambda genome in the example provided on the wiki page (see link above). I then tried running the assembler on my own PacBio reads using the following script:

#Celera genome assembler directory
CELERA="~/wgs-8.3rc2/Linux-amd64/bin/"

#Output directory
OUT="celera_output_3"

#Variables from parameters
FILE=$1
NAME=$2
SPEC=$3

#Raw data directory
#RAW="raw_data_test_phage"

#Perl environment variable
export PERLLIB=~/perl/modules/lib/perl5
export PERL5LIB=~/perl/modules/lib/perl5

#Create output directory and switch to it
mkdir -p $OUT/$NAME
cd $OUT/$NAME

#Run assembler
$CELERA/PBcR -length 5000 -s ../../$SPEC -l $NAME -fastq ../../$FILE genomeSize=50000

I do not get a asm.asm or asm.qc file. I also don't see any obvious errors in the log files. Then again, the log file that the celera assembler produces is quite long and I may be missing something. The structure of the output (i.e. files and directories) looks like this:

|-- [NAME]
|   |-- 0-mercounts
|   |-- 0-mertrim
|   |-- 0-overlaptrim
|   |-- 0-overlaptrim-overlap
|   |-- 1-overlapper
|   |-- 3-overlapcorrection
|   |-- 4-unitigger
|   |-- 5-consensus
|   |-- 5-consensus-coverage-stat
|   |-- 5-consensus-insert-sizes
|   |-- asm.gkpStore
|   |-- asm.gkpStore.err
|   |-- asm.gkpStore.errorLog
|   |-- asm.gkpStore.fastqUIDmap
|   |-- asm.gkpStore.info
|   |-- asm.ovlStore
|   |-- asm.ovlStore.err
|   |-- asm.ovlStore.list
|   |-- asm.tigStore
|   `-- runCA-logs
|-- [NAME].correction.err
|-- [NAME].correction.hist
|-- [NAME].fasta
|-- [NAME].fastq
|-- [NAME].frg
|-- [NAME].log
|-- [NAME].longest25.fastq -> [NAME].fastq
|-- [NAME].longest25.frg -> [NAME].frg
|-- [NAME].qual
`-- temp[NAME]
    |-- 1-overlapper
    |-- [NAME].frg
    |-- [NAME].spec
    |-- asm.eidToIID
    |-- asm.gkpStore.err
    |-- asm.gkpStore.errorLog
    |-- asm.gkpStore.fastqUIDmap
    |-- asm.gkpStore.info
    |-- asm.hist
    |-- asm.ignore
    |-- asm.iidToLen
    |-- asm.layout.err
    |-- asm.layout.hist
    |-- asm.layout.success
    |-- asm.ovlStore.err
    |-- asm.ovlStore.list
    |-- asm.seedlength
    |-- asm.split.allEdit
    |-- asm.split.uid
    |-- asm.toerase.err
    |-- asm.toerase.out
    |-- asm.toerase.uid
    |-- asm.totalInputBP
    |-- corrected.log
    |-- runCA-logs
    |-- runCorrection.sh
    `-- runPartition.sh

So my questions are as follows:

Why am I not getting an asm.asm (the assembly I assume) or a asm.qc (assembly statistics) file?
If the assembly failed, where in the logs can I get an indication as to why it failed?
The lambda example included a parameter called -partitions. What is this parameter? I couldn't find an explanation for it and I didn't include it in my script
The raw data that we recieved all had the suffix .subreads.fastq. Is there a post-processing step that needs to be run before I run assembly?

Celera-assembler HGAP pacbio • 2.4k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.4 years ago by tptacek3050 ▴ 70

Ram · Answer 1 · 2015-11-18

0

Entering edit mode

9.4 years ago

rhall ▴ 160

The assembly failed during the 5-consensus stage. Check the [ Name ] - runCA-logs directory for the specific task failure. My guess would be in utgcns, possibly memory related.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by rhall ▴ 160

0

Entering edit mode

There was a _utgcnsfix file, but no _utgcns file. The contents of this file (1446144349_sipsey-compute-1-12.local_20669_utgcnsfix) were as follows:

CA version 8.3rc2 ($Id: utgcnsfix.C 4442 2013-10-04 14:33:50Z brianwalenz $).

Error Rates:
AS_OVL_ERROR_RATE 0.030000
AS_CNS_ERROR_RATE 0.100000
AS_CGW_ERROR_RATE 0.100000
AS_MAX_ERROR_RATE 0.400000

Current Working Directory:
/scratch/user/tptacek/Vikram/celera_output_4/H37Rv

Command:
/home/tptacek/wgs-8.3rc2/Linux-amd64/bin/utgcnsfix \
  -g /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/asm.gkpStore \
  -t /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/asm.tigStore 2 001 \
  -o /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/5-consensus/asm_001.fixes

I browsed through the other files in this directory, and I didn't see any obvious error messages. All of the other files looked like this. The contents of the runCA-logs directory looks like this:

-rw-r--r-- 1 tptacek genetics 1762 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20273_runCA
-rw-r--r-- 1 tptacek genetics  520 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20280_gatekeeper
-rw-r--r-- 1 tptacek genetics  430 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20283_gatekeeper
-rw-r--r-- 1 tptacek genetics  443 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20285_gatekeeper
-rw-r--r-- 1 tptacek genetics  456 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20288_gatekeeper
-rw-r--r-- 1 tptacek genetics  543 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20290_initialTrim
-rw-r--r-- 1 tptacek genetics  443 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20291_gatekeeper
-rw-r--r-- 1 tptacek genetics  339 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20293_meryl
-rw-r--r-- 1 tptacek genetics  606 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20295_meryl
-rw-r--r-- 1 tptacek genetics  475 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20297_estimate-mer-threshold
-rw-r--r-- 1 tptacek genetics  458 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20299_meryl
-rw-r--r-- 1 tptacek genetics  339 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20300_meryl
-rw-r--r-- 1 tptacek genetics  458 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20302_meryl
-rw-r--r-- 1 tptacek genetics  580 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20305_overlap_partition
-rw-r--r-- 1 tptacek genetics  760 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20321_overlapInCore
-rw-r--r-- 1 tptacek genetics  671 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20374_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  671 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20380_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  840 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20384_deduplicate
-rw-r--r-- 1 tptacek genetics  643 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20386_finalTrim
-rw-r--r-- 1 tptacek genetics  637 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20388_chimera
-rw-r--r-- 1 tptacek genetics  571 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20392_overlap_partition
-rw-r--r-- 1 tptacek genetics  744 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20408_overlapInCore
-rw-r--r-- 1 tptacek genetics  620 Oct 29 13:45 1446144340_sipsey-compute-1-12.local_20442_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  615 Oct 29 13:45 1446144340_sipsey-compute-1-12.local_20454_correct-frags
-rw-r--r-- 1 tptacek genetics  707 Oct 29 13:45 1446144343_sipsey-compute-1-12.local_20484_correct-olaps
-rw-r--r-- 1 tptacek genetics  537 Oct 29 13:45 1446144347_sipsey-compute-1-12.local_20632_overlapStore
-rw-r--r-- 1 tptacek genetics  747 Oct 29 13:45 1446144347_sipsey-compute-1-12.local_20635_bogart
-rw-r--r-- 1 tptacek genetics  524 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20652_gatekeeper
-rw-r--r-- 1 tptacek genetics  595 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20661_gatekeeper
-rw-r--r-- 1 tptacek genetics  527 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20663_tigStore
-rw-r--r-- 1 tptacek genetics  619 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20667_tigStore
-rw-r--r-- 1 tptacek genetics  590 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20669_utgcnsfix
-rw-r--r-- 1 tptacek genetics  605 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20674_tigStore
-rw-r--r-- 1 tptacek genetics  573 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20678_tigStore
-rw-r--r-- 1 tptacek genetics  561 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20682_gatekeeper
-rw-r--r-- 1 tptacek genetics  651 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20685_computeCoverageStat

Does any of this look unusual? In the mean time, I'll queue up another run with increased memory.

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 9.4 years ago by tptacek3050 ▴ 70