Hi, I have question related to 16s metagenomics analysis. I have a pipeline including:
USEARCH (dereplicate, cluster and map to OTU) -> RDP classifier
I have a sample with 200k joined pair-end seq from illumina that I know at least 90% are valid bacterial 16s.
But after dereplicate and cluster I got only ~6k sequences which are classified with RDP at the end file from which I can calculate bacterial percentage for the taxonomy. But these 6k sequences are I think the non-redundant sequences and their copy number are not taking into account?
If I run just RDP classifier directly with the sample FASTA file I got all sequences classified which sound better..... So I wonder what is doing this USEARCH and isnit nessesary step?