splitting daligner jobs
2
1
Entering edit mode
7.0 years ago
Ric ▴ 440

Hi, I would like to run HINGE assembler (https://github.com/HingeAssembler/HINGE). However, I just wonder wether the neccerry steps (https://github.com/HingeAssembler/HINGE/blob/master/demo/NCTC9657_demo/run.sh) could be splitted and run on different computers and later merged?

fasta2DB NCTC9657 reads.pb.fasta

DBsplit NCTC9657

HPC.daligner NCTC9657 | bash -v

rm NCTC9657.*.NCTC9657.*.las
LAmerge NCTC9657.las NCTC9657.[0-9].las
DASqv -c100 NCTC9657 NCTC9657.las

Thank you in advance.

Michal

Assembly HINGE daligner pacbio • 1.6k views
ADD COMMENT
0
Entering edit mode
7.0 years ago
chen ★ 2.5k

I have no experience with HINGE, but splitting and merging is usually not applicable for de novo assembly, that's why tools like trinity require so much memory.

However, if you do want to split and run, you can use fastp to filter and split your FASTQ files.

ADD COMMENT
0
Entering edit mode
7.0 years ago

The command

HPC.daligner NCTC9657

generates a script. One can parallelise the various calls to daligner in the script on different machines and then merge after moving things back.

DBsplit -s<double> NCTC9657

would split the DB into chunks of size at most s Mbp, and this would allow one to control the size of the job on each computer.

ADD COMMENT
0
Entering edit mode

Hi, in the meantime I started to split the database with the following command:

DBsplit -x500 -s200 plantDB 
DBdust plantDB

It created the following files:

-rw-rw----  1 lorencm Waterhouse_Team 476M Dec 18 15:23 .plantDB.idx
-rw-rw----  1 lorencm Waterhouse_Team  23K Dec 18 15:23 plantDB.db
-rw-rw----  1 lorencm Waterhouse_Team  25G Dec 18 15:23 .plantDB.bps
-rw-rw----  1 lorencm Waterhouse_Team 527M Dec 18 16:33 .plantDB.dust.data
-rw-rw----  1 lorencm Waterhouse_Team  96M Dec 18 16:33 .plantDB.dust.anno

How can I run HPC.daligner on different computers?

Thank you in advance

ADD REPLY
0
Entering edit mode

Hi,

I got many daligner: Track 'dust' annotation file is junk with the following commands

DBsplit -x500 -s200 DB
DBdust DB 
HPC.daligner DB -T8 -mdust -H6973

I wrote the following script to submit each command from test.01.OVL parallel.

#!/bin/bash

while IFS='' read -r line || [[ -n "$line" ]]; do
  cmd=$line 

  #cat <<EOF
  qsub <<EOF
#!/bin/bash -l

#PBS -N HPCdaligner
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -l mem=80G
#PBS -l ncpus=4
#PBS -M m.lorenc@qut.edu.au
###PBS -m bea

cd \$PBS_O_WORKDIR

$cmd

EOF

done < "$1"

The script gets executed as following way sh HPC.daligner_pbs.sh test.01.OVL

Did I miss anything?

Thank you in advance.

ADD REPLY

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6