Question

Abyss genome assembly on several nodes

0

Entering edit mode

5.9 years ago

Igor Lalin • 0

Hi I am running the following script:

  #!/bin/bash
  # Abyss assembly pipeline

    cores=40
    species='Favanaceum'
    Qcut=30

    # merge non-overlapping pairs with konnector and assembly at various k
    for k in `seq 26 10 126`;
    do
    konnector -j $cores -k $k -o kon$k out_reads_1.fastq out_reads_2.fastq
    mkdir ${species}-k$k
    abyss-pe -C ${species}-k$k name=$species-$k k=$k np=$cores q=$Qcut \
    lib='pe1 pe2' long='longa' \
    pe1='../kon${k}_reads_1.fq' pe2='../kon${k}_reads_2.fq' \
    se='../out_merged.fastq ../kon${k}_merged.fa' \
    longa='../05001-genome.fa'

    done

As you can see, it's relatively straightforward where after qsub -pe smp 40, I use 40 slots on one node. Would it be possible to run parallel jobs on different nodes?

That way you could have several different k assemblies running at the same time for the sake of decreased time.

How would you change my shell script to do this?

Thank you so much

abyss node • 1.4k views

ADD COMMENT • link 5.9 years ago by Igor Lalin • 0

0

Entering edit mode

You should check with your HPC folks on how to submit a job that needs 40+ cores, they'd be able to help you better.

EDIT: Removed comments that recommended better formatting.

ADD REPLY • link 5.9 years ago by Ram 44k

0

Entering edit mode

Thank you RamRs. I appreciate it.....will do!

ADD REPLY • link 5.9 years ago by Igor Lalin • 0

score 2 · Accepted Answer · 2019-01-24

Make a shell script that holds this part:

k=$1
konnector -j $cores -k $k -o kon$k out_reads_1.fastq out_reads_2.fastq
mkdir ${species}-k$k
abyss-pe -C ${species}-k$k name=$species-$k k=$k np=$cores q=$Qcut \
lib="pe1 pe2" long="longa" \
pe1="../kon${k}_reads_1.fq" pe2="../kon${k}_reads_2.fq" \
se="../out_merged.fastq ../kon${k}_merged.fa" \
longa="../05001-genome.fa" \
unitigs

then simply submit the jobs using your loop:

for k in `seq 26 10 126`;
do
qsub <abyssScript> $k
done

if your genome is not that big and as you do not have many input files, ABySS should run fairly quick enough on a single (multi-core) node.

Xtra tip: What you can do is to add the target 'unitigs' in your cmdline (added it in above example) which will stop the ABySS pipeline after generating the unitigs, which is already a good point to choose your 'best kmer