Question

Abyss, Optimizing the parameter k, running time taking 7 days, still without results

0

Entering edit mode

6.1 years ago

macielrodriguez2 ▴ 50

Hi! I'm new in this field of bioinformatics, so please, forgive my mistakes

I'm running this script to determine the Kmer to use for the assembly

#!/bin/bash 

for k in `seq 20 8 90`; do                                                                      
    mkdir k$k
    abyss-pe np=6 -C k$k k=$k name=cloroplast lib='pea' \
    pea='../../../ZM1/ZM1_R1.fastq ../../../ZM1/ZM1_R2.fastq' 
done
abyss-fac k*/cloroplast-contigs.fa

The size of my data is 144.8 gb for each group of reads (2). My computer has 12 CPUs and 156 RAM.

The abyss has been running for nearly 7 days now but I don't see any results (it only created the folder k20, but it is still empty).

I want to know, taking into account the capacity of my computer and the size of my data, how much time would still be needed to complete this process? or if it is possible with my current computer?

Thank you in advance :)

Assembly zea mays • 1.7k views

ADD COMMENT • link 6.1 years ago by macielrodriguez2 ▴ 50

0

Entering edit mode

Do you really have 145Gb of data for a chloroplast genome?

the 12CPU might be OK at first sight, memory might be a bit limiting I'm afraid.

You should in any case have had some output already? is there no runtime output?

You could add v=-v to have more verbose output (allows checking if things are still progressing) and to give some idea of runtimes: I ran that part on a ~5Tb input dataset (estimated genomesize 25Gb), on 150 cores which takes ~3-4 days and uses +- 1,5 Tb of RAM

ADD REPLY • link 6.1 years ago by lieven.sterck 15k

0

Entering edit mode

estimated genomesize 25Gb

What on earth are you assembling over there? oO

ADD REPLY • link 6.1 years ago by cschu181 ★ 2.8k

0

Entering edit mode

just some small-ish conifer genome

ADD REPLY • link 6.1 years ago by lieven.sterck 15k

0

Entering edit mode

The 145 gb of data are of each read of the purple maize genome (ZM_R1.fastq and ZM_R2.fastq).

We want to use that data to assembly some contigs of the genome of its chloroplast. But perhaps we should not do this with the whole data set...

Thanks for your answer! I'll add that v=-v to the code to see what's happening

ADD REPLY • link 6.1 years ago by macielrodriguez2 ▴ 50

0

Entering edit mode

do I then understand correctly you have ~ 290Gb in total? if that is the case I think your available memory will not suffice, at least not for the large Kmers.

What is your read length btw? for nowadays sequencing it's not really worth running the really small Kmers. I would advise to focus on the 2/3 of read length range

ADD REPLY • link 6.1 years ago by lieven.sterck 15k