Hi all!
I am attempting to perform de novo assembly of sunflower with Supernova 2.0.0.
I am having some difficulty getting it to finish within the wallclock limit for resources I am using. I have a wallclock limit of 48 hours on SDSC Comet (64 cores, 1.4TB memory) and 72 hours on Savio here at UCB (16 cores, 512GB memory).
I typically have not been including --maxreads in my scripts, assuming that will produce the best quality assembly, but this is not realistic considering my wallclock limits. One question I have is whether or not I should limit the number of reads to what the sequencing company has given in their report. Our sequences are from HiSeq X, and it says that the number of reads are 261M reads. Is this "reads" from the sequencing company different than the reads (as in maxreads) for supernova?
Also, do you set localcores and localmem? Or do you just let the program use the resources available on that node?
I should also add that the genome is 3.6G-bases, quite large. I also expect some heterozygosity.
Thanks!
What job scheduler do you use? As I recall 10x supports LSF and SGE. While SLURM is not supported officially it does work. You should allocate resources properly when starting Supernova jobs. 2 d may not be enough time. This thread hassome useful information albeit from an older version of Supernova: 10x Supernova de novo assembly .
Hi Genomax,
Thanks for your reply. I am so new to anything computational biology related, as I am a first year rotating graduate student - please bear with me. I have read many other troubleshooting posts on this forum. I am working with SDSC to see if I can incorporate checkpoint restart in my scripts to pick up where I left off since wallclock limits are, well, limiting.
Here is my script:
Is this not ideal? It has finished on Savio with --maxreads=10M, with extremely poor quality
@Peter: There is a separate settings file that can be found in
supernova/2.0.0/supernova-2.0.0/martian-cs/2.3.1/jobmanagers/
hierarchy. Has that been properly configured on your cluster for the scheduler you are using?Is the memory specification correct (is that in GB?) and does the #SBATCH value match what you are using on the command line (in previous version this was controlled by the file I mentioned above).
e.g. With 64 cores and 1.4T RAM you should be able to finish this job in two days. Have you prepared the fastqs using
supernova mkfastq
?Are you sure you have enough reads? The default setting is 1.2B (calculated for the 3.2 Gb human genome @ 57x coverage).
You've only got around 11-fold coverage. Well below the recommended minimum of 38x.
https://support.10xgenomics.com/de-novo-assembly/software/pipelines/latest/using/running
While the number of reads is less than the recommended first challenge for @peter is to get the software to finish an analysis. While not optimal he should at least get some contigs.
Peter.....could you also confirm whether these are Gemcoded reads made from a 10X Chromium library rather than any old Illumina data.
I've recently been testing my system with the aphid genome to see it's up to the task before I apply it to my own data. 261M reads on the system you have ought to complete in well under 1 day.
@Andy: Since these are not answers for the original question you should post these comments by using
ADD COMMENT
(original post)/ADD REPLY
(other existing comments).Hey did you manage to get check pointing to work? I'm having the exact same problem on supernova on comet
I think the check pointing is automatic. You just need to restart the job (if I recall right).
So the job timed out and you just kicked it off again? Or you had to ask it to checkpoint? I originally did the latter but it didn't create a checkpoint file
I believe that is correct (it has been some time since I ran a 10x job). I think it keeps track of where things are. You don't need to create a file.
Awesome! Thanks for the fast responses!
Hi All, I am having a lot of trouble getting started with the supernova assembly. It keeps giving me one or the other error none of it quite makes sense.
I have the same sample with four indexes and it sees it as separate samples or refuses to identify the directory its in or just simply quits. Its a new error everytime.
Here is my script:
Here is the error: there is no file
Please help, Shaili.
Do you have your data files in directory called
/STS/fastq_gz/
?Hi I am running de novo assembly of Canola flower midge with Supernova 2.1.1
I wrote a slurm job script which gave me lots of errors. especially it says the module load supernova 2.1.1 " there is no such module" Would you be able to share the your slurm job script with me? It would be so helpful to me as I can edited it accordingly. Thank you
Is this old data that you are trying to use since 10x stopped supporting linked reads a while back.
You can't simply use a SLURM script since
supernova
requires some SLURM specific edits to config files. Are you working on a cluster? Does your cluster useSLURM
as a job scheduler? If not you will not be able to use this specific script.