Question

Software For Quality Filtering Of 454 Data Sets

2

Entering edit mode

14.8 years ago

Pawel Szczesny 3.2k

454 technology produces a number of errors in the reads, mostly (but not only) related to homopolymeric runs. It requires some degree of quality filtering, that is removing reads that contain false information. It's often based on quite simple measures of number of consecutive low quality bases and length. Are there any other approaches to quality filtering than the ones implemented in Pyro/AmpliconNoise packages?

genomics read quality filter • 5.7k views

ADD COMMENT • link updated 14.4 years ago by Casbon ★ 3.3k • written 14.8 years ago by Pawel Szczesny 3.2k

2

Entering edit mode

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its way of "denoising" by clustering is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising method not developed in the context of your application, you will make compromise, which is suboptimal.

ADD REPLY • link 14.8 years ago by lh3 33k

0

Entering edit mode

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, denoising mostly causes troubles.

ADD REPLY • link 14.8 years ago by lh3 33k

0

Entering edit mode

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, better work with raw data. 454 reads are not so difficult/different to process. I do not see much need of denoising and few are doing that.

ADD REPLY • link 14.8 years ago by lh3 33k

0

Entering edit mode

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, better work with that. 454 reads are not so difficult/different to process. I do not see much need of denoising and few are doing that.

ADD REPLY • link 14.8 years ago by lh3 33k

0

Entering edit mode

When using unfiltered data one risks an over-prediction of microbial diversity in the metagenomic samples. See "The 'rare biosphere': a reality check".

ADD REPLY • link updated 6.0 years ago by Ram 45k • written 14.8 years ago by Pawel Szczesny 3.2k

0

Entering edit mode

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its "denoising" step is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising procedure, you will probably make compromise. My overall advice is: explicitly model sequencing errors in your application, but do not rely on a 3rd-party "denoiser" that is not built for your application.

ADD REPLY • link 14.8 years ago by lh3 33k

0

Entering edit mode

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its way of "denoising" by clustering is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising procedure, you will probably make compromise. My overall advice is: explicitly model sequencing errors in your application, but do not rely on a 3rd-party "denoiser" that is not built for your application.

ADD REPLY • link 14.8 years ago by lh3 33k

0

Entering edit mode

Are you asking this for amplicon (PCR product sequencing) or shotgun reads?

ADD REPLY • link 14.8 years ago by lexnederbragt ★ 1.3k

0

Entering edit mode

lh3, I see. Yes, denoising is indeed misleading, as I see people use it in a quite different context. I will re-edit the question in a minute.

ADD REPLY • link 14.8 years ago by Pawel Szczesny 3.2k

0

Entering edit mode

fixlex, mostly for amplicon based reads.

ADD REPLY • link 14.8 years ago by Pawel Szczesny 3.2k

score 3 · Answer 1 · 2010-11-22

3

Entering edit mode

14.8 years ago

Istvan Albert 103k

The mothur package has a number of methods for 454 based read filtering.

Take a look at the trim.seqs command:

ADD COMMENT • link 14.8 years ago by Istvan Albert 103k

score 1 · Answer 2 · 2010-12-10

If you have a reference or high coverage then using the 454 toolchain for mapping or assembly should handle this. The specific error modality you refer to, homopolymer runs, does not require the removal of the reads but careful calling of certain bases (those in homopolymer runs). Recent versions of the Newbler software output a histogram of signal strengths for the homopolymer runs to allow you to see the distribution of signal at these sites.