Question

10X Genomica Supernova Troubleshooting

0

Entering edit mode

6.7 years ago

peter_stokes ▴ 10

Hi all!

I am attempting to perform de novo assembly of sunflower with Supernova 2.0.0.

I am having some difficulty getting it to finish within the wallclock limit for resources I am using. I have a wallclock limit of 48 hours on SDSC Comet (64 cores, 1.4TB memory) and 72 hours on Savio here at UCB (16 cores, 512GB memory).

I typically have not been including --maxreads in my scripts, assuming that will produce the best quality assembly, but this is not realistic considering my wallclock limits. One question I have is whether or not I should limit the number of reads to what the sequencing company has given in their report. Our sequences are from HiSeq X, and it says that the number of reads are 261M reads. Is this "reads" from the sequencing company different than the reads (as in maxreads) for supernova?

Also, do you set localcores and localmem? Or do you just let the program use the resources available on that node?

I should also add that the genome is 3.6G-bases, quite large. I also expect some heterozygosity.

Thanks!

assembly • 4.2k views

ADD COMMENT • link updated 23 months ago by GenoMax 147k • written 6.7 years ago by peter_stokes ▴ 10

0

Entering edit mode

Also, do you set localcores and localmem? Or do you just let the program use the resources available on that node?

What job scheduler do you use? As I recall 10x supports LSF and SGE. While SLURM is not supported officially it does work. You should allocate resources properly when starting Supernova jobs. 2 d may not be enough time. This thread hassome useful information albeit from an older version of Supernova: 10x Supernova de novo assembly .

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Hi Genomax,

Thanks for your reply. I am so new to anything computational biology related, as I am a first year rotating graduate student - please bear with me. I have read many other troubleshooting posts on this forum. I am working with SDSC to see if I can incorporate checkpoint restart in my scripts to pick up where I left off since wallclock limits are, well, limiting.

Here is my script:

#!/bin/bash 
#SBATCH -D /oasis/scratch/comet/petersto/temp_project/assemblyOutput
#SBATCH -J supernova
#SBATCH --partition=large-shared
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=64
#SBATCH --mem=1400000
#SBATCH --time=48:00:00
#SBATCH -o /oasis/projects/nsf/ddp319/petersto/err_outs/assemblyFull7506.out
#SBATCH -e /oasis/projects/nsf/ddp319/petersto/err_outs/assemblyFull7506.err
#SBATCH --mail-user=xxxx
#SBATCH --mail-type=All

export PATH=/oasis/projects/nsf/ddp319/petersto/programs/supernova/supernova-2.0.0:$PATH

supernova run --id=assemblyFull7506 \
--fastqs=/oasis/projects/nsf/ddp319/petersto/10X_seqData/ \
--localcores=64 \
--localmem=1400 \
--description="Full Assembly 7506"

Is this not ideal? It has finished on Savio with --maxreads=10M, with extremely poor quality

ADD REPLY • link updated 6.7 years ago by GenoMax 147k • written 6.7 years ago by peter_stokes ▴ 10

0

Entering edit mode

@Peter: There is a separate settings file that can be found in supernova/2.0.0/supernova-2.0.0/martian-cs/2.3.1/jobmanagers/ hierarchy. Has that been properly configured on your cluster for the scheduler you are using?

Is the memory specification correct (is that in GB?) and does the #SBATCH value match what you are using on the command line (in previous version this was controlled by the file I mentioned above).

e.g. With 64 cores and 1.4T RAM you should be able to finish this job in two days. Have you prepared the fastqs using supernova mkfastq?

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Are you sure you have enough reads? The default setting is 1.2B (calculated for the 3.2 Gb human genome @ 57x coverage).

You've only got around 11-fold coverage. Well below the recommended minimum of 38x.

https://support.10xgenomics.com/de-novo-assembly/software/pipelines/latest/using/running

ADD REPLY • link 6.7 years ago by Andy ▴ 20

0

Entering edit mode

While the number of reads is less than the recommended first challenge for @peter is to get the software to finish an analysis. While not optimal he should at least get some contigs.

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Peter.....could you also confirm whether these are Gemcoded reads made from a 10X Chromium library rather than any old Illumina data.

I've recently been testing my system with the aphid genome to see it's up to the task before I apply it to my own data. 261M reads on the system you have ought to complete in well under 1 day.

ADD REPLY • link 6.7 years ago by Andy ▴ 20

0

Entering edit mode

@Andy: Since these are not answers for the original question you should post these comments by using ADD COMMENT (original post)/ADD REPLY (other existing comments).

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Hey did you manage to get check pointing to work? I'm having the exact same problem on supernova on comet

ADD REPLY • link 6.7 years ago by rcw27 ▴ 10

0

Entering edit mode

I think the check pointing is automatic. You just need to restart the job (if I recall right).

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

So the job timed out and you just kicked it off again? Or you had to ask it to checkpoint? I originally did the latter but it didn't create a checkpoint file

ADD REPLY • link 6.7 years ago by rcw27 ▴ 10

0

Entering edit mode

I believe that is correct (it has been some time since I ran a 10x job). I think it keeps track of where things are. You don't need to create a file.

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Awesome! Thanks for the fast responses!

ADD REPLY • link 6.7 years ago by rcw27 ▴ 10

0

Entering edit mode

Hi All, I am having a lot of trouble getting started with the supernova assembly. It keeps giving me one or the other error none of it quite makes sense.

I have the same sample with four indexes and it sees it as separate samples or refuses to identify the directory its in or just simply quits. Its a new error everytime.

Here is my script:

#!/bin/bash
#SBATCH --job-name=Supernova
#SBATCH --mem=256G
#SBATCH -t 10-00:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user shailij@gmail.edu
#SBATCH -A bblock
#SBATCH -o %x.o%j
date
hostname
module load supernova/2.1.1
supernova run --id=18032FL-99_SJ \
 --sample=18032FL-99-23-01-index1, 18032FL-99-23-01-index2, 18032FL-99-23-01-index3, 18032FL-99-23-01-index4
 --fastqs=/STS/fastq_gz/ \
 --description=STS_10Xassembly \
 --maxreads='all'
echo 'Done'

Here is the error: there is no file

--fastqs=/STS/fastq_gz/

Please help, Shaili.

ADD REPLY • link updated 3.1 years ago by GenoMax 147k • written 3.1 years ago by shailij • 0

0

Entering edit mode

Do you have your data files in directory called /STS/fastq_gz/?

ADD REPLY • link 3.1 years ago by GenoMax 147k

0

Entering edit mode

Hi I am running de novo assembly of Canola flower midge with Supernova 2.1.1

I wrote a slurm job script which gave me lots of errors. especially it says the module load supernova 2.1.1 " there is no such module" Would you be able to share the your slurm job script with me? It would be so helpful to me as I can edited it accordingly. Thank you

ADD REPLY • link 23 months ago by Kanishka • 0

0

Entering edit mode

Is this old data that you are trying to use since 10x stopped supporting linked reads a while back.

You can't simply use a SLURM script since supernova requires some SLURM specific edits to config files. Are you working on a cluster? Does your cluster use SLURM as a job scheduler? If not you will not be able to use this specific script.

ADD REPLY • link 23 months ago by GenoMax 147k

score 0 · Accepted Answer · 2018-04-18

Hey all!

Sorry for the long hiatus!

Turns out, it has an automatic checkpoint (so long as you don't adjust your script!!!!). I was making the mistake of constantly changing my script to increase efficiency. In doing so, upon submitting the job, the scheduler would see that as a new job, and overwrite all files from the previously "failed" (due to timeout) job with that same name.

I ended up with a 25X coverage genome, which I am happy about for what I am using it for! But, it worked!

With 1.45TB Memory and 48 cores, it took about 8-9 days. So still a very very long time. Some steps took the entire 48 hour wall clock limit at SDSC's comet.

Anyways, thanks for all the help and suggestions; if anyone would like to see my scripts for the jobs I submitted on SDSC Comet, I am happy to provide :).