Entering edit mode
5.7 years ago
macielrodriguez2
▴
50
Hi! I'm new in this field of bioinformatics, so please, forgive my mistakes
I'm running this script to determine the Kmer to use for the assembly
#!/bin/bash
for k in `seq 20 8 90`; do
mkdir k$k
abyss-pe np=6 -C k$k k=$k name=cloroplast lib='pea' \
pea='../../../ZM1/ZM1_R1.fastq ../../../ZM1/ZM1_R2.fastq'
done
abyss-fac k*/cloroplast-contigs.fa
The size of my data is 144.8 gb for each group of reads (2). My computer has 12 CPUs and 156 RAM.
The abyss has been running for nearly 7 days now but I don't see any results (it only created the folder k20, but it is still empty).
I want to know, taking into account the capacity of my computer and the size of my data, how much time would still be needed to complete this process? or if it is possible with my current computer?
Thank you in advance :)
Do you really have 145Gb of data for a chloroplast genome?
the 12CPU might be OK at first sight, memory might be a bit limiting I'm afraid.
You should in any case have had some output already? is there no runtime output?
You could add
v=-v
to have more verbose output (allows checking if things are still progressing) and to give some idea of runtimes: I ran that part on a ~5Tb input dataset (estimated genomesize 25Gb), on 150 cores which takes ~3-4 days and uses +- 1,5 Tb of RAMWhat on earth are you assembling over there? oO
just some small-ish conifer genome
The 145 gb of data are of each read of the purple maize genome (ZM_R1.fastq and ZM_R2.fastq).
We want to use that data to assembly some contigs of the genome of its chloroplast. But perhaps we should not do this with the whole data set...
Thanks for your answer! I'll add that v=-v to the code to see what's happening
do I then understand correctly you have ~ 290Gb in total? if that is the case I think your available memory will not suffice, at least not for the large Kmers.
What is your read length btw? for nowadays sequencing it's not really worth running the really small Kmers. I would advise to focus on the 2/3 of read length range