Software For Quality Filtering Of 454 Data Sets
2
2
Entering edit mode
14.0 years ago

454 technology produces a number of errors in the reads, mostly (but not only) related to homopolymeric runs. It requires some degree of quality filtering, that is removing reads that contain false information. It's often based on quite simple measures of number of consecutive low quality bases and length. Are there any other approaches to quality filtering than the ones implemented in Pyro/AmpliconNoise packages?

genomics read quality filter • 5.0k views
ADD COMMENT
2
Entering edit mode

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its way of "denoising" by clustering is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising method not developed in the context of your application, you will make compromise, which is suboptimal.

ADD REPLY
0
Entering edit mode

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, denoising mostly causes troubles.

ADD REPLY
0
Entering edit mode

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, better work with raw data. 454 reads are not so difficult/different to process. I do not see much need of denoising and few are doing that.

ADD REPLY
0
Entering edit mode

Most "denoising" methods come at the cost of losing information or data. When it is possible to process the raw data, better work with that. 454 reads are not so difficult/different to process. I do not see much need of denoising and few are doing that.

ADD REPLY
0
Entering edit mode

When using unfiltered data one risks an over-prediction of microbial diversity in the metagenomic samples. See "The 'rare biosphere': a reality check".

ADD REPLY
0
Entering edit mode

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its "denoising" step is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising procedure, you will probably make compromise. My overall advice is: explicitly model sequencing errors in your application, but do not rely on a 3rd-party "denoiser" that is not built for your application.

ADD REPLY
0
Entering edit mode

I actually think the word "denoising" is a little misleading. All PyroNoise does is to "model" sequencing errors and because its primary goal is clustering, its way of "denoising" by clustering is the right thing to do. Nonetheless, if you do not model sequencing errors in your application but simply reply on an independent denoising procedure, you will probably make compromise. My overall advice is: explicitly model sequencing errors in your application, but do not rely on a 3rd-party "denoiser" that is not built for your application.

ADD REPLY
0
Entering edit mode

Are you asking this for amplicon (PCR product sequencing) or shotgun reads?

ADD REPLY
0
Entering edit mode

lh3, I see. Yes, denoising is indeed misleading, as I see people use it in a quite different context. I will re-edit the question in a minute.

ADD REPLY
0
Entering edit mode

fixlex, mostly for amplicon based reads.

ADD REPLY
3
Entering edit mode
14.0 years ago

The mothur package has a number of methods for 454 based read filtering.

Take a look at the trim.seqs command:

ADD COMMENT
1
Entering edit mode
14.0 years ago
Casbon ★ 3.3k

If you have a reference or high coverage then using the 454 toolchain for mapping or assembly should handle this. The specific error modality you refer to, homopolymer runs, does not require the removal of the reads but careful calling of certain bases (those in homopolymer runs). Recent versions of the Newbler software output a histogram of signal strengths for the homopolymer runs to allow you to see the distribution of signal at these sites.

ADD COMMENT

Login before adding your answer.

Traffic: 2677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6