Question

Evaluation Of 16S Chimera Finding Algorithms With Query 16S Sequences From Hitherto Unknown Organisms

3

Entering edit mode

14.7 years ago

Monzoor ▴ 300

Algorithms/Softwares meant for finding chimeras in 16S sequences work by comparing the ends of the 16S sequences to reference database. Query sequences, whose ends match two different 16S reference sequences are flagged as potential chimeras.

My question is as follows. In case, the query sequence (whether chimeric or not chimeric) orignates from an entirely new species/phylum (as in metagenomics), its ends will obviously not show any sort of alignment with existing reference 16s sequences.

In that case, how do existing chimera programs identify whether the query sequence is a chimera ? Are exising programs validated for this scenario ?

rrna algorithm • 5.3k views

ADD COMMENT • link updated 14.3 years ago by Jake ▴ 150 • written 14.7 years ago by Monzoor ▴ 300

score 3 · Answer 1 · 2010-12-10

3

Entering edit mode

14.7 years ago

Pawel Szczesny 3.2k

Correct me if I'm wrong, but you need to amplify 16S fragment to sequence it. For that you need to use some kind of primers - if not universal ones (see Wikipedia article), the ones you design. If you don't know what you are looking for, you won't find it, will you? In other words, it's hard to imagine to catch completely new/dissimilar 16S gene using primers designed de-novo without any reference/starting point.

That said, significant portion of detected chimeras in metagenomic samples are in fact real 16S sequences. For that reason authors of Chimera Slayer (see Microbiome Utilities) state

It is not recommended to blindly discard all sequences flagged as chimeras. Some may represent naturally formed chimeras that do not represent PCR artifacts. Sequences flagged may warrant further investigation.

As far as I know, further investigation means manual inspection of the alignment, but I have never done it myself (but people to whom I handle the data do that on regular basis).

ADD COMMENT • link 14.7 years ago by Pawel Szczesny 3.2k

0

Entering edit mode

If I understand correctly, the primers are based on the conserved portions of the 16S gene. In principle, there can be an 16S sequence from an entirely new organism pulled out using the conserved portion as a primer. This means that it is possible to catch completely new/dissimilar 16S gene using primers designed from conserved regions. Please correct me if I have got the concept wrong

ADD REPLY • link 14.7 years ago by Monzoor ▴ 300

0

Entering edit mode

So called, universal primers are not that universal as they were considered. So while it's possible to sequence 16S that cannot be assigned to any of the known organism, you cannot be 100% sure you pulled all 16S RNAs that were in the sample. So, from that perspective, using universal primers will might easily give you certain amount of unassignable 16S RNAs but it shouldn't be an issue for chimera detection software, as it uses information from the sample, not from taxonomy/databases.

ADD REPLY • link 14.7 years ago by Pawel Szczesny 3.2k

0

Entering edit mode

I thought that software like chimera slayer, bellerophon etc use information of known 16S sequences. I am not aware of a software that performs a denovo detection of chimeric sequences by processing information from the sample itself. Your comments please

ADD REPLY • link 14.7 years ago by Monzoor ▴ 300

0

Entering edit mode

Indeed, CS compares reads to the database (I thought it just uses alignment to identify conserved regions and compares these regions within the sample - it didn't make sense to me to compare reads to database containing things that are not there). Nevertheless, in case of universal primers, conserved regions are preserved enough for CS to perform as expected.

ADD REPLY • link 14.7 years ago by Pawel Szczesny 3.2k

0

Entering edit mode

Sorry if I am getting on your nerves. What if the chimera has occurred between two variable regions ?

ADD REPLY • link 14.7 years ago by Monzoor ▴ 300

0

Entering edit mode

Sorry, I've been terribly busy... I'm simply not sure what is the answer to your question. The common sense would be that as long as alignment is possible (conserved regions are really conserved - this is want I meant by referring to universal primers) CS should operate as expected, no matter where chimera starts and ends (assuming chimera of two reads). But I haven't checked if CS has some kind of threshold value for alignment.

ADD REPLY • link 14.7 years ago by Pawel Szczesny 3.2k

score 1 · Answer 2 · 2011-11-01

PCR generated chimeras are a problem in metagenomics. If they are ignored they will result in much higher measurements of sequence diversity. Have you considered looking at Perseus in the AmpliconNoise package http://code.google.com/p/ampliconnoise/ ? Their paper nicely summarises the (your) problem.

For real pyrosequencing data we will not know a priori what sequences should be present and therefore chimera identification algorithms are necessary

I haven't actually run Perseus myself but it looks pretty interesting, I work mostly from shotgun sequences and I couldn't quite work out it it would work on that data.

score 0 · Answer 3 · 2011-10-31

You cannot know if an entirely new PCR sequence is a chimera, though if no part of it is highly similar to known sequences and it is seen multiple times and from multiple samples I think you accept that it isn't.

From metagenomic sequences, chimeras are not a problem - they get generated during PCR. This has been published.