Trowel - error correction modul for Illumina reads
2
0
Entering edit mode
9.7 years ago
anon ▴ 50

Hi All,

have anyone of you used Trowel (http://sourceforge.net/projects/trowel-ec/)? If yes, what were the results and experiences?

Thanks a lot!

next-gen sequencing genome • 3.0k views
ADD COMMENT
3
Entering edit mode
9.6 years ago
abysslover ▴ 30

I'm the author of trowel. I will briefly introduce the strength of the trowel.

The results

Accuracy

The accuracy of sequencing error corrections can be varied depending on the read coverage, genome size, quality values, read length, and so on. According to the evaluation on different datasets (you can see the supplementary data of the original paper), we can observe that trowel has relatively high accuracy in terms of the standard metrics (specificity, sensitivity, precision, => F-score, Gain). Papers about sequencing error corrections normally do not evaluate the accuracy with paired-end alignments, which show the mis-corrections in longer sequence context, but we have done that. In addition, the other tools have not shown the before-and-after state changes of alignments. For example, some mis-corrections of a tool could lead to no-alignments for reads that had been mappable. trowel has preserved the alignment states for almost all reads.

You should keep eyes on the sensitivity of error corrections, which indicates how many the error corrections has performed or ignored by the tool. The sensitivity is highly dependent on the coverage-depth of a dataset and k-mer length. trowel only uses quality values of reads, meaning that trowel may correct sequencing errors for low-coverage datasets. However, due to fewer observations on the true sequences, the sensitivity of the error corrections would be lower for the low-coverage datasets. For lower coverage datasets, an alignment based method (overlap-consensus method) is the better option or you could reduce the k-mer size.

Runtime & memory

trowel is highly parallelized and only supports for the shared-memory model. Therefore, it would be better to apply trowel to a single high performance computer in which large amount of memory installed.

The future version of trowel would reduce the memory consumption (currently I am working on).

Given our experience, trowel works well up to genome size of 500 Mb. For supporting human-sized genomes, we are still working on.

If you have more questions, you can contact me by an email: euncheon.lim at tue.mpg.de .

I hope that this answer is useful for you.

Euncheon

ADD COMMENT
0
Entering edit mode

Sorry for being late with my answer, it was useful for me. Thank you much!

Szandra

ADD REPLY
0
Entering edit mode

You're welcome!

I've just finished implementing a new version of trowel. The feature highlights are as follows:

  1. larger genome support
  2. a capability of correcting several files from the same sample (in case you have several paired-end reads and mate-pair libraries from a single sample)
  3. the memory consumption has been reduced
  4. dynamic k-mer length (higher sensitivity, users do not need to specify the length of a k-mer)

The code is unavailable in public since it contains unpublished ideas. We only provide a binary working in 64-bit linux environment.

ADD REPLY
0
Entering edit mode
9.1 years ago

Hello Euncheon

Trowel is pretty useful and I see that you have made some excellent updates. You said "The sensitivity is highly dependent on the coverage-depth of a dataset and k-mer length." What value of coverage do you consider as high or low coverage and large genome size as implemented for trowel? It will be nice to give a range (coverage and genome size) so as to make it easier to choose appropriate k between 11-15

Thank you

ADD COMMENT
0
Entering edit mode

The low-coverage was initially meant for the datasets of average coverage less than 10.
In fact, the original Trowel can work with the coverage of 1 as long as there is a high quality template sequences. This fact is distinct from the conventional methods.

Large genome size is prohibited due to the current algorithm does not includes the parallel IO and memory efficient index. I have not tested Trowel 1 with genome size larger than 500 Mb.

For k mer, it is recommended to be used within a range of k 15-31 but not even k values in order to deal with palindromic k-mers.

If you are planning to evaluate Trowel, you should use Trowel 1. Trowel 2 contains so many experimental algorithms and I confirm that Trowel 2 has much less accuracy than Trowel 1. I have removed some rigorous algorithms from Trowel 2, leading to very bad sensitivity. I have no time to improve Trowel 1 and 2 right now due to my study plan. When everything become settled, I will be back to improve trowel.

ADD REPLY

Login before adding your answer.

Traffic: 2027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6