Abyss, Optimizing the parameter k, running time taking 7 days, still without results
0
0
Entering edit mode
5.7 years ago

Hi! I'm new in this field of bioinformatics, so please, forgive my mistakes

I'm running this script to determine the Kmer to use for the assembly

#!/bin/bash 

for k in `seq 20 8 90`; do                                                                      
    mkdir k$k
    abyss-pe np=6 -C k$k k=$k name=cloroplast lib='pea' \
    pea='../../../ZM1/ZM1_R1.fastq ../../../ZM1/ZM1_R2.fastq' 
done
abyss-fac k*/cloroplast-contigs.fa

The size of my data is 144.8 gb for each group of reads (2). My computer has 12 CPUs and 156 RAM.

The abyss has been running for nearly 7 days now but I don't see any results (it only created the folder k20, but it is still empty).

I want to know, taking into account the capacity of my computer and the size of my data, how much time would still be needed to complete this process? or if it is possible with my current computer?

Thank you in advance :)

Assembly zea mays • 1.6k views
ADD COMMENT
0
Entering edit mode

Do you really have 145Gb of data for a chloroplast genome?

the 12CPU might be OK at first sight, memory might be a bit limiting I'm afraid.

You should in any case have had some output already? is there no runtime output?

You could add v=-v to have more verbose output (allows checking if things are still progressing) and to give some idea of runtimes: I ran that part on a ~5Tb input dataset (estimated genomesize 25Gb), on 150 cores which takes ~3-4 days and uses +- 1,5 Tb of RAM

ADD REPLY
0
Entering edit mode

estimated genomesize 25Gb

What on earth are you assembling over there? oO

ADD REPLY
0
Entering edit mode

just some small-ish conifer genome

ADD REPLY
0
Entering edit mode

The 145 gb of data are of each read of the purple maize genome (ZM_R1.fastq and ZM_R2.fastq).

We want to use that data to assembly some contigs of the genome of its chloroplast. But perhaps we should not do this with the whole data set...

Thanks for your answer! I'll add that v=-v to the code to see what's happening

ADD REPLY
0
Entering edit mode

do I then understand correctly you have ~ 290Gb in total? if that is the case I think your available memory will not suffice, at least not for the large Kmers.

What is your read length btw? for nowadays sequencing it's not really worth running the really small Kmers. I would advise to focus on the 2/3 of read length range

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6