Questions about using the Celera Genome Assembler for HGAP
1
0
Entering edit mode
9.1 years ago
tptacek3050 ▴ 70

This post is a followup to a previous post: FASTQC and PacBio reads

I am trying to use the PBcR pipeline for the Celera Genome Assembler (v8.3) to perform HGAP for pacbio reads (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR).

I've got the assembler installed, and I was able to successfully assemble the lambda genome in the example provided on the wiki page (see link above). I then tried running the assembler on my own PacBio reads using the following script:

#Celera genome assembler directory
CELERA="~/wgs-8.3rc2/Linux-amd64/bin/"

#Output directory
OUT="celera_output_3"

#Variables from parameters
FILE=$1
NAME=$2
SPEC=$3

#Raw data directory
#RAW="raw_data_test_phage"

#Perl environment variable
export PERLLIB=~/perl/modules/lib/perl5
export PERL5LIB=~/perl/modules/lib/perl5

#Create output directory and switch to it
mkdir -p $OUT/$NAME
cd $OUT/$NAME

#Run assembler
$CELERA/PBcR -length 5000 -s ../../$SPEC -l $NAME -fastq ../../$FILE genomeSize=50000

I do not get a asm.asm or asm.qc file. I also don't see any obvious errors in the log files. Then again, the log file that the celera assembler produces is quite long and I may be missing something. The structure of the output (i.e. files and directories) looks like this:

|-- [NAME]
|   |-- 0-mercounts
|   |-- 0-mertrim
|   |-- 0-overlaptrim
|   |-- 0-overlaptrim-overlap
|   |-- 1-overlapper
|   |-- 3-overlapcorrection
|   |-- 4-unitigger
|   |-- 5-consensus
|   |-- 5-consensus-coverage-stat
|   |-- 5-consensus-insert-sizes
|   |-- asm.gkpStore
|   |-- asm.gkpStore.err
|   |-- asm.gkpStore.errorLog
|   |-- asm.gkpStore.fastqUIDmap
|   |-- asm.gkpStore.info
|   |-- asm.ovlStore
|   |-- asm.ovlStore.err
|   |-- asm.ovlStore.list
|   |-- asm.tigStore
|   `-- runCA-logs
|-- [NAME].correction.err
|-- [NAME].correction.hist
|-- [NAME].fasta
|-- [NAME].fastq
|-- [NAME].frg
|-- [NAME].log
|-- [NAME].longest25.fastq -> [NAME].fastq
|-- [NAME].longest25.frg -> [NAME].frg
|-- [NAME].qual
`-- temp[NAME]
    |-- 1-overlapper
    |-- [NAME].frg
    |-- [NAME].spec
    |-- asm.eidToIID
    |-- asm.gkpStore.err
    |-- asm.gkpStore.errorLog
    |-- asm.gkpStore.fastqUIDmap
    |-- asm.gkpStore.info
    |-- asm.hist
    |-- asm.ignore
    |-- asm.iidToLen
    |-- asm.layout.err
    |-- asm.layout.hist
    |-- asm.layout.success
    |-- asm.ovlStore.err
    |-- asm.ovlStore.list
    |-- asm.seedlength
    |-- asm.split.allEdit
    |-- asm.split.uid
    |-- asm.toerase.err
    |-- asm.toerase.out
    |-- asm.toerase.uid
    |-- asm.totalInputBP
    |-- corrected.log
    |-- runCA-logs
    |-- runCorrection.sh
    `-- runPartition.sh

So my questions are as follows:

  • Why am I not getting an asm.asm (the assembly I assume) or a asm.qc (assembly statistics) file?
  • If the assembly failed, where in the logs can I get an indication as to why it failed?
  • The lambda example included a parameter called -partitions. What is this parameter? I couldn't find an explanation for it and I didn't include it in my script
  • The raw data that we recieved all had the suffix .subreads.fastq. Is there a post-processing step that needs to be run before I run assembly?
Celera-assembler HGAP pacbio • 2.3k views
ADD COMMENT
0
Entering edit mode
9.1 years ago
rhall ▴ 160

The assembly failed during the 5-consensus stage. Check the [ Name ] - runCA-logs directory for the specific task failure. My guess would be in utgcns, possibly memory related.

ADD COMMENT
0
Entering edit mode

There was a _utgcnsfix file, but no _utgcns file. The contents of this file (1446144349_sipsey-compute-1-12.local_20669_utgcnsfix) were as follows:

CA version 8.3rc2 ($Id: utgcnsfix.C 4442 2013-10-04 14:33:50Z brianwalenz $).

Error Rates:
AS_OVL_ERROR_RATE 0.030000
AS_CNS_ERROR_RATE 0.100000
AS_CGW_ERROR_RATE 0.100000
AS_MAX_ERROR_RATE 0.400000

Current Working Directory:
/scratch/user/tptacek/Vikram/celera_output_4/H37Rv

Command:
/home/tptacek/wgs-8.3rc2/Linux-amd64/bin/utgcnsfix \
  -g /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/asm.gkpStore \
  -t /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/asm.tigStore 2 001 \
  -o /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/5-consensus/asm_001.fixes

I browsed through the other files in this directory, and I didn't see any obvious error messages. All of the other files looked like this. The contents of the runCA-logs directory looks like this:

-rw-r--r-- 1 tptacek genetics 1762 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20273_runCA
-rw-r--r-- 1 tptacek genetics  520 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20280_gatekeeper
-rw-r--r-- 1 tptacek genetics  430 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20283_gatekeeper
-rw-r--r-- 1 tptacek genetics  443 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20285_gatekeeper
-rw-r--r-- 1 tptacek genetics  456 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20288_gatekeeper
-rw-r--r-- 1 tptacek genetics  543 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20290_initialTrim
-rw-r--r-- 1 tptacek genetics  443 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20291_gatekeeper
-rw-r--r-- 1 tptacek genetics  339 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20293_meryl
-rw-r--r-- 1 tptacek genetics  606 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20295_meryl
-rw-r--r-- 1 tptacek genetics  475 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20297_estimate-mer-threshold
-rw-r--r-- 1 tptacek genetics  458 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20299_meryl
-rw-r--r-- 1 tptacek genetics  339 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20300_meryl
-rw-r--r-- 1 tptacek genetics  458 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20302_meryl
-rw-r--r-- 1 tptacek genetics  580 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20305_overlap_partition
-rw-r--r-- 1 tptacek genetics  760 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20321_overlapInCore
-rw-r--r-- 1 tptacek genetics  671 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20374_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  671 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20380_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  840 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20384_deduplicate
-rw-r--r-- 1 tptacek genetics  643 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20386_finalTrim
-rw-r--r-- 1 tptacek genetics  637 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20388_chimera
-rw-r--r-- 1 tptacek genetics  571 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20392_overlap_partition
-rw-r--r-- 1 tptacek genetics  744 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20408_overlapInCore
-rw-r--r-- 1 tptacek genetics  620 Oct 29 13:45 1446144340_sipsey-compute-1-12.local_20442_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  615 Oct 29 13:45 1446144340_sipsey-compute-1-12.local_20454_correct-frags
-rw-r--r-- 1 tptacek genetics  707 Oct 29 13:45 1446144343_sipsey-compute-1-12.local_20484_correct-olaps
-rw-r--r-- 1 tptacek genetics  537 Oct 29 13:45 1446144347_sipsey-compute-1-12.local_20632_overlapStore
-rw-r--r-- 1 tptacek genetics  747 Oct 29 13:45 1446144347_sipsey-compute-1-12.local_20635_bogart
-rw-r--r-- 1 tptacek genetics  524 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20652_gatekeeper
-rw-r--r-- 1 tptacek genetics  595 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20661_gatekeeper
-rw-r--r-- 1 tptacek genetics  527 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20663_tigStore
-rw-r--r-- 1 tptacek genetics  619 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20667_tigStore
-rw-r--r-- 1 tptacek genetics  590 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20669_utgcnsfix
-rw-r--r-- 1 tptacek genetics  605 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20674_tigStore
-rw-r--r-- 1 tptacek genetics  573 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20678_tigStore
-rw-r--r-- 1 tptacek genetics  561 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20682_gatekeeper
-rw-r--r-- 1 tptacek genetics  651 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20685_computeCoverageStat

Does any of this look unusual? In the mean time, I'll queue up another run with increased memory.

ADD REPLY

Login before adding your answer.

Traffic: 1337 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6