Experiences With Clc Genomics Workbench (Or Other Commercial Tools) For Next-Gen Sequencing
3
9
Entering edit mode
14.7 years ago

In the past year or so I have gained some level of familiarity with multiple open source command line tools used for short read mapping. As expected, different approaches seem to have different strengths and weaknesses.

There are also a number of commercial applications such as the CLC Genomics workbench that, based on their own reporting, have seemingly superior performance characteristics.

I find them intriguing and, considering the costs of next-gen sequencing in general, the software's cost would probably not be out of reach if these indeed worked as advertised.

Has anyone here worked with both commercial and open-source methods? I would be most interested in hearing hands on experiences: both positive and negative.

next-gen-sequencing • 13k views
ADD COMMENT
2
Entering edit mode

Hope to be able to come back and comment on this at some point, I've been contracted to investigate a number of commercial solutions for an NGS focused diagnostics company and we're lining up demos as I type..

ADD REPLY
1
Entering edit mode

... sounds great ... please do comment if at all possible

ADD REPLY
8
Entering edit mode
14.7 years ago

Hello Istvan,

I am presently going the opposite way from what you do, coming from CLC, I am exploring (anew) the possibilities of open software (mainly mira3 now) for 454 assembly.

I work in a lab mainly into evolution and genomics in fishes. We have been using CLC Genomic Workbench for the last year with great results. The software IS really a well integrated resources with a lot of small functionalities. Nothing that you couldn't find in the open source world, but well put together.

We have mostly used the software to toy with RNA-seq 454 data in non-model species, so our expertise is with de-novo assembly of expressed sequences, namely cDNA from RNA containing poly-A tails.

I would say that the main strength of CLC is it's easiness of use, mainly for non-computer-oriented biologists, or even for non-hardcore-linux users. For example, using menus, it is easy to import .sff or fasta/fasta.qual data, trim sequences according to different criteria, do a de novo assembly, save consensus sequences from the contigs, do a reference assembly (possibly with only a subset of sequences), look for SNPs, export SNP tables and ACE assembly files. This we have done repeatedly and new comers in the lab get quickly to their results without too much of a chock. It is, however, highly suggested to use the software on a 64 bit system with plenty of ram (8 gigs and up).

All this comes at the price of some flexibility and some transparency, I guess. The reason I am looking into mira3 now is that, it appears that the de novo alignment algorithm is not totally appropriate for RNA-seq projects. The steps involved in the alignment lead to more gene chimeras and strange coverage patterns within each contig that may be expected from a 'correct' approach.

Overall, our experience with CLC HAS been very satisfactory and we will likely continue to use it in the near future. As you mention, the scary (at first) cost of the license compares lightly to the total cost of one (not mentioning that you will probably use it for many) next gen projects.

Hoping this helps :)

Cheers

(don't hesitate to share your ideas on mira3 or other open software also :)

ADD COMMENT
1
Entering edit mode

nice writeup, open source tools need more improvements in their usability

ADD REPLY
0
Entering edit mode

Hi there Eric, I'll soon start using CLCbio to perform RNAseq analysis but I'm kind of intrigued here by your answer. I'd be using the 6.0.3 version (I know your comment was made three years ago and maybe some things does not apply to the updated versions, etc). Well, my question: pe any chance, did you use any more recent version of the CLCbio software? If so, did you also found the same tricky artifacts for de novo assembly with CLC? Thanks in advance for any reply :)

ADD REPLY
0
Entering edit mode

Hi rodrigues8998. We are still using CLC (the newest version) semi-routinely for different things, although I have never done any RNAseq analyses with it. Since it already had what looked like decent RNAseq capabilities 2-3 years ago, I think you will probably be able to do what you want with it. Be sure to read the doc to properly understand how CLC does those types of analyses and how it may differ from other approaches.

ADD REPLY
6
Entering edit mode
13.8 years ago

If anyone is still interested in this subject, I found this paper to be extremely helpful: Comparing de novo assemblers for 454 transcriptome data

It compares Newbler 2.3, Newbler 2.5, CAP3, CLC, SeqMan, and MIRA.

The quick and dirty of it:

  • Newbler 2.3 is the worst and shouldn't be used
  • CLC is the fastest by far (4 min) and gathers a lot of unique contigs due to the de Bruijn graph algorithm used BUT is inaccurate and generates an overall smaller assembly
  • Newbler 2.5 is fantastic overall, generating an overall very large assembly in a moderate amount of time (45 min)
  • SeqMan is also fantastic, generating the most unique sequences and the largest assembly but taking much longer than Newbler (6 hrs)
  • CAP3 (1 day) was not distinguished, doing no better than Newbler 2.5 or SeqMan at anything yet taking longer
  • MIRA (3 days) was not distinguished, barely doing better than SeqMan at anything yet taking by far the longest

I hope this helps! I recommend reading the actual paper. It's only 10 or so pages and it will give you a better idea of what you can expect regarding your own situation.

Brandon

ADD COMMENT
4
Entering edit mode
14.7 years ago

I think these commercial apps are quite useful for bench scientists, especially if there are no bioinformatics programmers in the lab.

I was intrigued by CLC because a group at UC Davis had reported excellent results mixing Velvet and CLC for de novo transcriptome assembly.

In the end I found the CLC white paper to be very honest and revealing. Basically their old version returned slightly better N50 using unpaired reads but with more misassemblies. Their new version was perhaps tuned too conservatively, and produced shorter N50's on both paired and unpaired data, with no misassemblies.

http://www.clcbio.com/files/whitepapers/white_paper_on_de_novo_assembly_on_the_CLC_Assembly_Cell.pdf

So I've held off for now, but I think CLCbio has some brilliant developers and would not rule out their product replacing a lot of what is done on the command line wrt RNAseq and assembly, especially if they can get it bundled with sequencers.

ADD COMMENT

Login before adding your answer.

Traffic: 2041 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6