a lot of OTUs wth no reference
0
0
Entering edit mode
7.1 years ago
agata88 ▴ 870

Hi all!

I have more that 2000 OTUs detected, for which only 25% have assigned taxonomy. The quality of reads is good. I have suspicious that it is a soil sample but I am not sure about it. I know it's Miseq, V3-V4.

After blast search of unassigned OTUs the results shows : "uncultured bacteria".

My questions are: What could be the reason of detection of so many OTUs? How interpret OTUs with no taxonomy?

Many thanks for any suggestion,

Best,

Agata

16S • 3.0k views
ADD COMMENT
0
Entering edit mode

I say this as someone who has no idea what OTUs are or how they are detected - your question needs some context. Is there a specific method/tool you used to detect OTUs? You say your data is from Miseq, what did you use to process this data? What organism is this data from?

Maybe my lack of knowledge on this is generating more questions than would be necessary for an expert on OTUs, but context always helps.

ADD REPLY
0
Entering edit mode

What tool you used and which database you compared to? Was the taxonomy unassigned at all or at the species/genus/family etc. level?

ADD REPLY
0
Entering edit mode

I have my own taxonomy...but the same results are for Greengenes 97_otu.fasta. Taxonomy was unassigned at all.

ADD REPLY
0
Entering edit mode

At what percentage identity are you clustering your OTUs? Have you filtered out singletons and doubletons? What are you blasting your OTU representative sequences against? Why aren't you using a more sophisticated tool like e.g. RDP classifier?

ADD REPLY
0
Entering edit mode

97%, I've filtered singletons and doubletons. I am blasting against my own reference but the same results are for Geengenes (97_otu.fasta).

I need to say that I am sure that this is not the software problem - because it was validated many times and worked very well .

I am more curious about the biological problem ...could primers don't work well? Could there be new species not adnotated in any reference?

ADD REPLY
1
Entering edit mode

Classify your OTU reps against Silva or RDP. Also, for non-complete 16S you should consider 99 or even 100% identity for OTU clustering, see e.g. https://www.biorxiv.org/content/early/2017/09/21/192211

ADD REPLY
0
Entering edit mode

99-100% is required only for species level. It doesn't mean he can't get valuable information with less than that.

ADD REPLY
1
Entering edit mode

I'm talking about OTU clustering, not taxonomy annotation threshold

ADD REPLY
0
Entering edit mode

Yes, I think this is too much strict...if you have a good quality reads, and high number of reads you can assign taxonomy to species level even with 97% OTU clustering.

ADD REPLY
1
Entering edit mode

Thank you all for help. After considering your suggestions I've decided to push up to 100% OTU clustering for species discovery. I know that I will lost a lot of biological meaningful information but since I am curious only for species - I think this is the best option :)

PS. I've checked 100% OTU clustering treshold for Mock community public sample and it works better that 97% OTU clustering.

ADD REPLY
0
Entering edit mode

If you're looking for a relative with 97% identity you're likely to not find any match, it's too strict for soil samples. You should use other methods which take a broader look and they will be able to classify it to higher taxonomic levels (Genus, Family etc.), mothur for instance. You can try to run some OTUs in SILVA classifier as well.

ADD REPLY
0
Entering edit mode

Why 97% is to strict for soil samples?

ADD REPLY
1
Entering edit mode

Because the vast majority of the jungle in there had never been isolated, this is terranova

ADD REPLY
1
Entering edit mode

I don't think 97% match is too strict for soil samples. OP is getting hits, it's just that they're to "uncultured bacteria". With RDP and SILVA at least, each included 16S sequence includes lineage information, e.g. "Bacteria: Proteobacteria: Gammaproteobacteria: Enterobacteriales: Enterobacteriaceae: Escherichia; uncultured bacteria". So all OP has to do is to look at information at genus or family level..

ADD REPLY
0
Entering edit mode

This kind of results only mean that someone sequenced this 16S, predicted its lineage and deposited it (or SILVA predicted the lineage of a deposited 16S sequence). Predicting lineage with matches less than 97% is basically the same.

ADD REPLY
0
Entering edit mode

This is the information I was looking for. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6