Hello everyone,
I have 16S data and I'm trying to identify what genus/species etc. are in my samples and their relative abundance.
It's my first time working with 16S data and I'm also not used to classification, I'm trying to improve what I already did and understand some concepts.
As of today, to identify what's in my sample I've been relying on classifiers like Kraken or Centrifuge and the 16S databases like SILVA or GreenGenes. While they do give results that are fine on my test samples (I'm looking at genus level for now) my "pipeline" only consists on cleaning the files with tools like trimmomatic or Fastp before feeding them to the main classification tool.
I feel like this is very light and was wondering what I could do to improve it. I was thinking about doing some assembly beforehand (I'm currently trying ABYSS and SPADES). I also noticed that people are suggesting doing OTUs but I don't really understand why grouping them, especially knowing that some bacterial species have more than 98% ANI on their 16S.
What do I miss about OTUs, and what can I do to improve my classifcation ?
EDIT: I found this document to be really interesting as it covers a lot of things.
Which region do you've for 16S?
Also, there's no assembly needed for 16S data.
Do you've Illumina data?
Right now I have illumina data targeting V3 & V4