Which will be best tool for plasmid assembly for illumina truseq data.
I have used velvet and spades but result are not good.
Can anyone please suggest me assembler or parameter for spades and velvet that will give good assembly of plasmid
Which will be best tool for plasmid assembly for illumina truseq data.
I have used velvet and spades but result are not good.
Can anyone please suggest me assembler or parameter for spades and velvet that will give good assembly of plasmid
I am getting around 700 contig and 6 mb genome
6 Mb is the typical size of a whole bacterial genome. Bacterial genomes are comprised of a chromosome (usually only one) and some or several plasmids. If you have prepared the DNA from a single colony then you should get less than 100 contigs. 700 contigs indicates that either your DNA was not homogeneous (eg contaminated with a second strain of bacteria) or that the coverage of the chromosome is very low.
How have you prepared the DNA? Have you done any step to separate plasmidic DNA from chromosomal DNA? Such separations are never 100 % selective! My guess is, that there was still enough chromosomal DNA which was sequenced with low coverage. Therefore the chromosomal DNA is dispersed over hundreds of contigs.
The FASTA file emitted by Spades reports the coverage of every contig.
>NODE_1_length_711720_cov_34.8955_ID_4768
>NODE_24_length_3121_cov_199.103_ID_4814
Please sort your contigs by coverage. Then inspect the contigs with the highest coverage. They will presumably comprise plasmidic sequences (or the highly redundant rRNA genes).
The assemblers you tried should be able to do the job, provided that you have tried a reasonable amount of assemblies with varying parameters. Whether you are assembling a plasmid or not makes little differences, please provide more info with regards to your dataset. For instance, sequencing depth of plasmid, length of reads, paired or not paired, is there anything else that is being sequenced? Also would be good to show what command lines you have already tried with velvet and spades, and tell us why the results are not good.
I suspect the problem is the data, and not the assembler. I am dealing with plasmid assembly myself and I notice problems with variable coverage, probably something to do with the biology of plasmid replication, since this variability is consistent among 2 different sequencing methods.
I would generally recommend Spades as the best assembler for things like plasmids. But considering that you have tried it, what, specifically, is the problem? Do you get too may contigs, or does it not assemble a all?
I would say this is a common problem with the nowadays "short" sequencing technology. A colleague of mine, tried to sequence a short genome, and he needed 7 years to fully complete it, and he eventually did it by using PacBio sequencing.
I think you need to use more than short reads. As in your case, you eventually discover that assembly noes not improve even though you increase the coverage of what you are sequencing.
The use of mate-paired reads will help you by doing a better scaffolding of your contigs. If your plasmid is a commercial one, a comparison with trusted and similar plasmids using programs like Mauve will help a lot in the task of ordering the contigs. You can also combine several kind os sequences, like the regular Illumina, mate-pairing, long Illumina reads and/or PacBio sequences. Otherwise, I think you will be hitting a hard task
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Many thanks for replay
sequencing depth of plasmid, => 550X
length of reads, 150
paired or not paired, => paired
is there anything else that is being sequenced => no
t command lines
1.velvet=>
AND
2 spades
AND
The output I am getting is around 6 mb genome and scaffold ~700. that is too far from expected results
My recommendations:
So there is something else being sequenced, the nuclear genome. If that genome is available, or if you can assemble it, then I would try to filter out the reads that map to the nuclear genome (provided that you do not have high identity regions common to both the nuclear genome and the plasmid). Very odd that you get only 2 contigs, that you are able to assemble the nuclear genome so well, if not fully and cannot assemble the plasmid.
All of the suggestions given so far are good. I would add that you can try our tool Recycler. It takes into consideration some of the same features as suggested here - coverage, circularity of sequences, and paired end mapping. I posted more details here (and in the links therein): Recycler for plasmid assembly