Forum:Careful!! Velvet Generates Different Assemblies From The Same Input Data And Same Input Parameters With Openmp Enabled.
4
1
Entering edit mode
11.0 years ago
Rahul Sharma ▴ 660

Hi all,

I am using four different libraries of read size 76bps, insert sizes are 300bps, 1kb, 8kb and 12kb. Expected genome size is 80MB.

I am running velvet using these four libraries. Actually I tried velvet first for different k-mers and then for the best k-mer I tried different cov_cutoffs. In all these assemblies from same k-mer (69) with different cov_cutoffs, I used the same Roadmap and Sequences files from the initial velveth run (K-mer 69 and cov_cutoff default).

Surprisingly I got 10MB of N50 and 23MB of largest scaffold size using cov_cutoff of 12 (median coverage is 30.76 in Log file) on the previously generated Roadmap and Sequences files. Then later I tried all new assembly from the same reads, K-mer 69 and cov_cutoff 12, now my N50 is 2MB and largest scaffold size is 6.78 MB.

Later I tried the same input files and same velvet parameters and I figured out that velveth is generating different Roadmap file for the same k-mer 69, for all three runs. What could be the reason behind this? In this case it is not possible to regenerate the results.

I would really appreciate your comments on this.

Best regards,
Rahul

velvet genome • 9.1k views
ADD COMMENT
3
Entering edit mode
11.0 years ago
SES 8.6k

The solution is to set the OMP thread environment in your shell script:

#!/bin/bash

export OMP_THREAD_LIMIT=1
export OMP_NUM_THREADS=1

velveth ....
velvetg .

This will ensure the same results, but it will mean that velveth will likely take longer to execute. For finding the optimal kmer and cutoff, I recommend using VelvetOptimser. It is especially important to use these OMP settings if you are using VelvetOptimiser, as every single thread will try to use all the processors on that node if you don't have these variables set appropriately.

ADD COMMENT
0
Entering edit mode

Thanks for this, but then velvet takes ages to generate assemblies on a single processor. Regarding VelvetOptimiser, I cannot use it in my analysis. Actually I have huge dataset and it consumes ~95% of RAM of our whole group's machine. I need to submit jobs accordingly, in this case manual way of optimizing velvet is the right choice.

ADD REPLY
0
Entering edit mode

Hi SES,

I am currently trying to setup my submission script for VelvetOptimizer on our cluster and was a bit confused as to how the various parameters had to be setup. The plan was to spread the instances over 24 threads, so I initially had the following parameters:

  • In the SLURM submission script: OMP_NUM_THREADS=24, OPENBLAS_NUM_THREADS=24, --cpus=24, --cpus-per-task=24, mem=256Gb
  • In the VelvetOptimizer command line: -t 24

However it appears that this generated thread allocation issues (it maybe tried to allocate 24*24 threads...?), and the program eventually crashed with the following error messages (not sure if the thread allocation actually caused the crash though; if you don't think so, I would appreciate any recommendation :-) ):

Use of uninitialized value in numeric ne (!=) at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 289.
Use of uninitialized value $maxScore in numeric gt (>) at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 290.
Use of uninitialized value in numeric gt (>) at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 290.
Aug 11 16:12:51 Hash value of best assembly by assembly score: 79
Aug 11 16:12:51 Optimisation routine chosen for best assembly: shortPaired
Aug 11 16:12:51 Looking for the expected coverage
Unable to open /gpfsuser/home/users/c/s/csheridan/data/auto_data_79/stats.txt for exp_cov determination.
 at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 838

Anyway, I then tried to correct this by setting the threads parameters as follows, but things appeared much slower with Velvet calculating only 3 hash values at a time:

  • In the SLURM submission script: OMP_NUM_THREADS=8, OPENBLAS_NUM_THREADS=8, --cpus=24, --cpus-per-task=24, mem=256Gb
  • In the VelvetOptimizer command line: -t 3

So my question is, how do you exactly set the threads parameters in order to gain from the use of OPENMP, and how can Velvet/Oases be optimally parallelised?

Thank you!

ADD REPLY
0
Entering edit mode

It is not a good idea to post a new question in the comment section. On Biostar we like to keep a single topic in thread. In addition very few people see your post so it is also inefficient.

ADD REPLY
1
Entering edit mode
11.0 years ago

I ran a test and my files are identical:

velveth a31 31 -shortPaired -fastq -separate r1.fq r2.fq
velveth b31 31 -shortPaired -fastq -separate r1.fq r2.fq
cmp a31/Roadmaps b31/Roadmaps

No output returned from cmp. Running the velvetg on each also produces identical results:

velvetg a31 -exp_cov auto
Final graph has 1055 nodes and n50 of 15415, max 65035, total 1446385, using 159599/200002 reads

velvetg b31 -exp_cov auto
Final graph has 1055 nodes and n50 of 15415, max 65035, total 1446385, using 159599/200002 reads
ADD COMMENT
0
Entering edit mode
Hello
i run velvet for transcriptome paired data by using cov_cutoff 4.0 min_contig_lgth 200
Final graph has 251014 nodes and n50 of 562, max 8255, total 32301310, using 0/32799252 reads

what are the nodes 251014 and maxand and toatal numbers indicates in velvet output. i could not able get it.

Please answer me on this.

ADD REPLY
1
Entering edit mode
11.0 years ago
Rahul Sharma ▴ 660

Now it has been clear that velvet generates different assemblies because of OPENMP, now what should be considered. Should I go for the 10MB N50 size assemblies? How much this will be realistic? For the publication point of view, our results should be re-generated using same methods. But when I run velvet 5 different times, with same data, k-mer 69 and cov_cutoff 12 it is giving me five different assemblies. Which assembly one should trust then?

I would appreciate your suggestions on this!

Thanks in advance! Rahul

ADD COMMENT
1
Entering edit mode

Well as long as we accept that velvet works and each assembly is valid just that one is more complete than the other I would pick the best assembly that works.

After all this is what people do anyhow when sweeping over the parameter space.

Then describe in the supplementary materials the issue with reproducibility.

ADD REPLY

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6