Algorithms/Softwares meant for finding chimeras in 16S sequences work by comparing the ends of the 16S sequences to reference database. Query sequences, whose ends match two different 16S reference sequences are flagged as potential chimeras.
My question is as follows. In case, the query sequence (whether chimeric or not chimeric) orignates from an entirely new species/phylum (as in metagenomics), its ends will obviously not show any sort of alignment with existing reference 16s sequences.
In that case, how do existing chimera programs identify whether the query sequence is a chimera ? Are exising programs validated for this scenario ?
If I understand correctly, the primers are based on the conserved portions of the 16S gene. In principle, there can be an 16S sequence from an entirely new organism pulled out using the conserved portion as a primer. This means that it is possible to catch completely new/dissimilar 16S gene using primers designed from conserved regions. Please correct me if I have got the concept wrong
So called, universal primers are not that universal as they were considered. So while it's possible to sequence 16S that cannot be assigned to any of the known organism, you cannot be 100% sure you pulled all 16S RNAs that were in the sample. So, from that perspective, using universal primers will might easily give you certain amount of unassignable 16S RNAs but it shouldn't be an issue for chimera detection software, as it uses information from the sample, not from taxonomy/databases.
I thought that software like chimera slayer, bellerophon etc use information of known 16S sequences. I am not aware of a software that performs a denovo detection of chimeric sequences by processing information from the sample itself. Your comments please
Indeed, CS compares reads to the database (I thought it just uses alignment to identify conserved regions and compares these regions within the sample - it didn't make sense to me to compare reads to database containing things that are not there). Nevertheless, in case of universal primers, conserved regions are preserved enough for CS to perform as expected.
Sorry if I am getting on your nerves. What if the chimera has occurred between two variable regions ?
Sorry, I've been terribly busy... I'm simply not sure what is the answer to your question. The common sense would be that as long as alignment is possible (conserved regions are really conserved - this is want I meant by referring to universal primers) CS should operate as expected, no matter where chimera starts and ends (assuming chimera of two reads). But I haven't checked if CS has some kind of threshold value for alignment.