De novo assembly for virus genome with Velvet.
2
0
Entering edit mode
10.0 years ago
Peter • 0

Dear expert,

I want to assembly a virus genome using velvet. The virus genome is about 3KB length. In the follow command, ref1.fa is the reference genome. But it seems that I can not success. I cannot get back ref1.fa from the simulate reads.

~/bin/bioinfomatics/wgsim-master/wgsim -N 500000 -1 100 -2 100 -h ~/projects/virus/analysis/ref1.fa r1.fq r2.fq
~/bin/bioinfomatics/velvet_1.2.10/contrib/shuffleSequences_fasta/shuffleSequences_fasta.pl r1.fq r2.fq output.fq

~/bin/bioinfomatics/velvet_1.2.10/contrib/VelvetOptimiser-2.2.4/VelvetOptimiser.pl\
    -s 27 -e 31 -f '-longPaired -fastq output.fq' -t 4 --optFuncKmer 'n50'

Dec  1 17:18:33
Will run velvet optimiser with the following paramters:
    Velveth parameter string:
        -shortPaired -fastq output.fq
    Velveth start hash values:    27
    Velveth end hash value:        31
    Velveth hash step value:    2
    Velvetg minimum coverage cutoff to use:    0

    Read tracking for final assembly off.
Dec  1 17:18:33

    Beginning velveth runs.
********************************************************
Assembly id: 1
Velveth timestamp: Dec  1 2014 17:18:57
Velveth version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_27 27 -shortPaired -fastq output.fq
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_27
Velvet hash value: 27
Roadmap file size: 110519999
**********************************************************
********************************************************
Assembly id: 2
Velveth timestamp: Dec  1 2014 17:18:59
Velveth version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_29 29 -shortPaired -fastq output.fq
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_29
Velvet hash value: 29
Roadmap file size: 107811920
**********************************************************
********************************************************
Assembly id: 3
Velveth timestamp: Dec  1 2014 17:19:00
Velveth version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_31 31 -shortPaired -fastq output.fq
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_31
Velvet hash value: 31
Roadmap file size: 103399704
**********************************************************
Dec  1 17:19:00

    Beginning vanilla velvetg runs.
********************************************************
Assembly id: 1
Assembly score: 53
Velveth timestamp: Dec  1 2014 17:18:57
Velvetg timestamp: Dec  1 2014 17:21:52
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_27 27 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_27  -clean yes
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_27
Velvet hash value: 27
Roadmap file size: 110519999
Total number of contigs: 1062
n50: 53
length of longest contig: 95
Total bases in contigs: 58657
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
**********************************************************
********************************************************
Assembly id: 2
Assembly score: 57
Velveth timestamp: Dec  1 2014 17:18:59
Velvetg timestamp: Dec  1 2014 17:22:06
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_29 29 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_29  -clean yes
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_29
Velvet hash value: 29
Roadmap file size: 107811920
Total number of contigs: 424
n50: 57
length of longest contig: 95
Total bases in contigs: 25334
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
**********************************************************
********************************************************
Assembly id: 3
Assembly score: 61
Velveth timestamp: Dec  1 2014 17:19:00
Velvetg timestamp: Dec  1 2014 17:22:07
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_31 31 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_31  -clean yes
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_31
Velvet hash value: 31
Roadmap file size: 103399704
Total number of contigs: 1917
n50: 61
length of longest contig: 99
Total bases in contigs: 121773
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
**********************************************************
Dec  1 17:22:07 Best assembly by assembly score - assembly id: 3
Dec  1 17:22:07 Optimisation routine chosen for best assembly: shortPaired
Dec  1 17:22:07 Looking for the expected coverage
Dec  1 17:22:09        Expected coverage set to 0
********************************************************
Assembly id: 3
Assembly score: 61
Velveth timestamp: Dec  1 2014 17:19:00
Velvetg timestamp: Dec  1 2014 17:22:07
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_31 31 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_31  -clean yes -exp_cov 0
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_31
Velvet hash value: 31
Roadmap file size: 103399704
Total number of contigs: 1917
n50: 61
length of longest contig: 99
Total bases in contigs: 121773
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
Paired Library insert stats:
**********************************************************
Dec  1 17:22:09 Setting the short insert length
Dec  1 17:22:09 Setting assembly short insert length(s) to auto
Dec  1 17:22:09 Beginning coverage cutoff optimisation
Minimum specified coverage cutoff is higher than the expected coverage. Please choose a minimum coverage cutoff smaller than 0 and re-run.
Assembly velvet • 5.1k views
ADD COMMENT
0
Entering edit mode

Here is my reference.

>2547-16_ASC_B
CTCCACCACTTTCCACCAAACTCTTCAAGATCCCAGAGTCAGGGCCCTGTACTTTCCTGCTGGTGGCTCCAGTTCAGGAACAGTGAGCCCTGCTCAGAATACTGTCTCTGCCATATCGTCAATCTTATCGAAGACTGGGGACCCTGTACCGAACATGGAGAACATCGCATCAGGACTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCACAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCAAATCTCCAGTCACTCACCAACCTGTTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTGCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCATCAACAACCAGCACCGGACCATGCAAGACCTGCACAACTCCTGCTCAAGGAACCTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGACGGAAACTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTCTGGCTTTCAGTTATATGGATGATGTGGTTTTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTATGCCGCTGTTACCCATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTCACAAAACAAAAAGATGGGGATATTCCCTTAACTTCATGGGATATGTAATTGGGAGTTGGGGCACATTGCCACAGGAACATATTGTACAAAAAATCAAAATGTGTTTTAGGAAACTTCCTGTAAACAGGCCTATTGATTGGAAAGTATGTCAACGAATTGTGGGTCTTTTGGGGTTTGCCGCACCTTTCACGCAATGTGGATATCCTGCTTTAATGCCTTTATATGCATGCATACAAGCAAAACAGGCTTTTACTTTCTCGCCAACTTACAAGGCCTTTCTAAGTCAACAGTATTTGAACCTTTACCCCGTTGCTCGGCAACGGCCTGGTCTGTGCCAAGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCTTGGCCATAGGCCATCAGCGCATGCGTGGAACCTTTGTGTCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCCGCTTGTTTTGCTCGCAGCAGGTCTGGGGCAAAACTCATCGGGACTGACAATTCTGTCGTGCTCTCCCGCAAGTATACATCATTTCCATGGCTGCTAGGCTGTGCTGCCAACTGGATCCTGCGCGGGACGTCCTTTGTTTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCCTCCCGGGGCCGCTTGGGGCTCTACCGCCCGCTTCTCCGCCTATTGTACCGACCGACCACGGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGCCCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACGCCCACAGGAACCTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTTTCAGCAATGTCAACGACCGACCTTGAGGCATACTTCAAAGACTGTGTGTTTAATGAGTGGGAGGAGTTGGGGGAGGAGGTGAGGTTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTGTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAATCATCTCATGTTCATGTCCTACTGTTCAAGCCTCCAAGCTGTGCCTTGGGTGGCTTTGGGGCATGGACATTGACCCGTATAAAGAATTTGGAGCTTCTGTGGAGTTACTCTCTTTTTTGCCTTCTGACTTCTTTCCTTCTATTCGAGATCTCCTCGACACCGCCTCTGCTCTGTATCGGGAGGCCTTAGAGTCTCCGGAACATTGTTCACCTCACCATACGGCACTCAGGCAAGCTATTCTGTGTTGGGGTGAGTTGATGAATCTAGCAACCTGGGTGGGAAGTAATTTGGAAGATCCAGCATCCAGGGAATTAGTAGTCAGCTATGTCAACGTTAACATGGGCCTAAAAATCAGACAACTATTGTGGTTTCATATTTCCTGTCTTACTTTTGGGAGAGAAACTGTTCTTGAATATTTGGTGTCTTTTGGAGTGTGGATTCGCACTCCTCCTGCATATAGACCACCAAATGCCCCTATCTTATCAACACTTCCGGAAACTACTGTTGTTAGACGAAGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAGTATTCCTTGGACACATAAGGTGGGAAACTTTACGGGGCTTTATTCTTCTACGGTACCCTGCTTTAATCCTAAATGGCAAACTCCTTCTTTTCCCGACATTCATTTGCAGGAGGACATTGTTGATAGATGTAAGCAATTTGTGGGGCCCCTTACAGTAAATGAAAACAGGAGACTAAAATTAATTATGCCTGCTAGGTTTTATCCCAATGTTACTAAATATTTGCCCTTAGATAAAGGGATCAAACCGTATTATCCAGAGTATGTAGTTAATCATTACTTCCAGACGCGACATTATTTACACACTCTTTGGAAGGCGGGGATCTTATATAAAAGAGAGTCCACACGTAGCGCCTCATTTTGCGGGTCACCATATTCTTGGGAACAAGATCTACAGCATGGGAGGTTGGTCTTCCAAACCTCGAAAAGGCATGGGGACAAATCTTTCTGTCCCCAATCCCCTGGGATTCTTCCCCGATCATCAGTTGGACCCTGCATTCAAAGCCAACTCAGAAAATCCAGATTGGGACCTCAACCCGCACAAGGACACCTGGCCGGACGCCAACAAGGTGGGAGTGGGAGCATTCGGGCCAGGGTTCACCCCTCCCCATGGGGGACTGTTGGGGTGGAGCCCTCAGGCTCAGGGCCTACTCGCAACTGTGCCAGCAGCTCCTCCTCCTGCCTCCACCAATCGGCAGTCAGGAAGGCAGCCTACTCCCTTATCTCCACCTCTAAGGGACACTCATCCTCAGGCCATGCAGTGGAA
ADD REPLY
0
Entering edit mode
10.0 years ago
Daniel ★ 4.0k

Something appears to be going wrong before that error as none of your contigs are larger than 99 bases. A few points:

  • Your input data type is 100bp paired end right? I wouldn't describe that as 'long paired end' in your command. But it seems that velvetoptimiser is running shortpaired anyway, so that may or may not be an issue.
  • What does the quality look like on the data? I don't know wgsim, but could the artificial error profiles be screwing it up?
ADD COMMENT
0
Entering edit mode

My input data is 100bp paired end.

I simulate the data with wgsim. The data quality is fine on the data.

ADD REPLY
0
Entering edit mode
10.0 years ago
rtliu ★ 2.2k

The coverage is too deep for velvet to handle, try to reduce the coverge to 50x - 100x. e.g. wgsim -N 1500.

With your current simulated data, use velvet-estimate-exp_cov.pl to evaluate the coverage-cutoff (say 300), add -exp_cov auto -cov_cutoff 300 parameters to velvetg

ADD COMMENT
0
Entering edit mode

Why here is not the deeper the better?

The reference genome is about 3000bp. Which is better for 2 * 300bp or 2 * 100bp or 1 * 300bp?

ADD REPLY
0
Entering edit mode

Try normalization the reads before assembly. http://ged.msu.edu/papers/2012-diginorm/

ADD REPLY

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6