Question

Using kraken2 to fish out only 16S reads

0

Entering edit mode

2.5 years ago

Rezenman • 0

Hey all,

I have metagenomics data from several samples, from which I want to fish out only 16S reads. I thought about using kraken2 to match reads to the silva database and only keep classified reads considering them as 16S reads. The only problem I have is that I am not sure at what threshold to set the confidence score. I would love to get some advice from the experienced community in here. My reads are 151bp paired end

if anyone has another useful tool for acheiving the same goal I would love to hear

Thanks a lot ! Shahar

microbiome metagenomics 16S Deep-sequencing • 2.2k views

ADD COMMENT • link updated 2.5 years ago by Mensur Dlakic ★ 28k • written 2.5 years ago by Rezenman • 0

0

Entering edit mode

I'm not sure kraken is the right tool for that. Why not use any other aligner like bwa against Silva database?

ADD REPLY • link 2.5 years ago by Asaf 10k

0

Entering edit mode

Hey, we have tried to do it using other alignment tools but it seems to be also problematic when setting the identity threshold, So we wanted to try out some other tools that are more geared towards microbiome analysis and it seems that Kraken is widely used in the microbiome community so I thought it might be relevant. Thanks for the suggestion!

ADD REPLY • link 2.5 years ago by Rezenman • 0

0

Entering edit mode

What primers did you use? Why do you consider it necessary to filter them out?. Besides, what about unclassified/unknown 16S reads? it would cause biased normalization. I am not an expert in the subject, I just want to understand the rationale behind your analysis.

ADD REPLY • link 2.5 years ago by Buffo ★ 2.4k

0

Entering edit mode

This is data originated from metagenomics so no primers, just whole population sequencing. We want to filter all 16S reads for downstream analysis, and yes you are right about the unclassified reads, that's why I wanted suggestions regarding the confidence score thinking of keeping it relatively low to allow "weak" matches also. Thanks for the response !

ADD REPLY • link 2.5 years ago by Rezenman • 0

0

Entering edit mode

As per the tutorial here on ONT reads https://usegalaxy.org/training-material/topics/metagenomics/tutorials/nanopore-16S-metagenomics/tutorial.html, confident score 0.1 is used in classification, for silva db.

ADD REPLY • link 2.5 years ago by cpad0112 21k

0

Entering edit mode

Great, that might be a good start. However, in their case, they are working with full 16S sequences and we have short reads from whole population sequencing so I guess some modifications are needed. Thanks!

ADD REPLY • link 2.5 years ago by Rezenman • 0

score 2 · Accepted Answer · 2022-06-07

2

Entering edit mode

2.5 years ago

Mensur Dlakic ★ 28k

There are many tools for this purpose.

https://github.com/hzi-bifo/RiboDetector (needs no database)
http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html (mirabait, comes with its own database)
https://bioinfo.lifl.fr/RNA/sortmerna/
https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/35881-introducing-bbsplit-read-binning-tool-for-metagenomes-and-contaminated-libraries?t=41288 (BBsplit)
https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/ (BBduk)

ADD COMMENT • link 2.5 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Great, I'll definitely take a look at that, do you have any favorite one?

ADD REPLY • link 2.5 years ago by Rezenman • 0

0

Entering edit mode

I generally use MIRA for this purpose, but RiboDetector features what seems to be the most universal approach.

ADD REPLY • link 2.5 years ago by Mensur Dlakic ★ 28k