Is it required for all sequences to be same length when clustering OTUs in metagenomics (ONT Nanopore)
1
0
Entering edit mode
2.3 years ago
O.rka ▴ 740

I am following a pipeline one of my collaborators created for using ONT reads for 16S rRNA OTU clustering. One of the steps, they truncate all of the reads so they are the same length (e.g., 1400).

Is this required or to have the sequences all at the same length? I feel like I'm arbitrarily throwing away useful information.

otu metagenomics clustering nanopore • 1.4k views
ADD COMMENT
1
Entering edit mode
2.3 years ago

Hi,

In my opinion, yes, it is. Of course, this depends on several variables, such as the primers used and the expected gene length, pipeline/alignmet used etc.

OTUs (Operational Taxonomic Units) are defined based on a threshold of similarity, such as 97-99%, meaning that for a particular OTU, let's say OTU1, the sequences that comprised OTU1 show a sequence similarity of >97-98% (this is based on sequence alignment).

In general, aligning sequences of the same length is easier and faster to resolve the best alignment.

Depending on the alignment algorithm, if it uses some kind of global alignment, the shorter sequences will have less similarity than longer sequences even if they perfectly align with longer sequences, simply because they don't align across the whole sequence, and, therefore, yield a lower similarity identity.

I hope this helps,

António

ADD COMMENT
0
Entering edit mode

This actually helps a lot! I guess that's one of the technical differences between performing ASV and OTU analysis that is under the hood.

ADD REPLY
0
Entering edit mode

Absolutely. With ASVs you're working with exact sequences. Even though, you always check if the ASV sequence length range is among your expectations (based on the primers user - see the DADA2 tutorial): https://benjjneb.github.io/dada2/tutorial.html (citing below)

Considerations for your own data: Sequences that are much longer or shorter than expected may be the result of non-specific priming. You can remove non-target-length sequences from your sequence table (eg. seqtab2 <- seqtab[,nchar(colnames(seqtab)) %in% 250:256]). This is analogous to “cutting a band” in-silico to get amplicons of the targeted length.

António

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6