RDP classifier pipeline question (16s analysis)
1
0
Entering edit mode
7.8 years ago
manekineko ▴ 150

Hi, I have question related to 16s metagenomics analysis. I have a pipeline including:

USEARCH (dereplicate, cluster and map to OTU) -> RDP classifier

I have a sample with 200k joined pair-end seq from illumina that I know at least 90% are valid bacterial 16s.

But after dereplicate and cluster I got only ~6k sequences which are classified with RDP at the end file from which I can calculate bacterial percentage for the taxonomy. But these 6k sequences are I think the non-redundant sequences and their copy number are not taking into account?

If I run just RDP classifier directly with the sample FASTA file I got all sequences classified which sound better..... So I wonder what is doing this USEARCH and isnit nessesary step?

16s • 2.9k views
ADD COMMENT
0
Entering edit mode
6.8 years ago
gb ★ 2.2k

I think you are asking your question wrong or you dont look good enough to your own data.

You say that you start with 200k merged reads and after dereplication and clustering you can only identify 6k read.

My question to you, what do you think dereplication and clustering does?

ADD COMMENT

Login before adding your answer.

Traffic: 1829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6