Entering edit mode
7.2 years ago
agata88
▴
870
Hi all!
I have more that 2000 OTUs detected, for which only 25% have assigned taxonomy. The quality of reads is good. I have suspicious that it is a soil sample but I am not sure about it. I know it's Miseq, V3-V4.
After blast search of unassigned OTUs the results shows : "uncultured bacteria".
My questions are: What could be the reason of detection of so many OTUs? How interpret OTUs with no taxonomy?
Many thanks for any suggestion,
Best,
Agata
I say this as someone who has no idea what OTUs are or how they are detected - your question needs some context. Is there a specific method/tool you used to detect OTUs? You say your data is from Miseq, what did you use to process this data? What organism is this data from?
Maybe my lack of knowledge on this is generating more questions than would be necessary for an expert on OTUs, but context always helps.
What tool you used and which database you compared to? Was the taxonomy unassigned at all or at the species/genus/family etc. level?
I have my own taxonomy...but the same results are for Greengenes 97_otu.fasta. Taxonomy was unassigned at all.
At what percentage identity are you clustering your OTUs? Have you filtered out singletons and doubletons? What are you blasting your OTU representative sequences against? Why aren't you using a more sophisticated tool like e.g. RDP classifier?
97%, I've filtered singletons and doubletons. I am blasting against my own reference but the same results are for Geengenes (97_otu.fasta).
I need to say that I am sure that this is not the software problem - because it was validated many times and worked very well .
I am more curious about the biological problem ...could primers don't work well? Could there be new species not adnotated in any reference?
Classify your OTU reps against Silva or RDP. Also, for non-complete 16S you should consider 99 or even 100% identity for OTU clustering, see e.g. https://www.biorxiv.org/content/early/2017/09/21/192211
99-100% is required only for species level. It doesn't mean he can't get valuable information with less than that.
I'm talking about OTU clustering, not taxonomy annotation threshold
Yes, I think this is too much strict...if you have a good quality reads, and high number of reads you can assign taxonomy to species level even with 97% OTU clustering.
Thank you all for help. After considering your suggestions I've decided to push up to 100% OTU clustering for species discovery. I know that I will lost a lot of biological meaningful information but since I am curious only for species - I think this is the best option :)
PS. I've checked 100% OTU clustering treshold for Mock community public sample and it works better that 97% OTU clustering.
If you're looking for a relative with 97% identity you're likely to not find any match, it's too strict for soil samples. You should use other methods which take a broader look and they will be able to classify it to higher taxonomic levels (Genus, Family etc.), mothur for instance. You can try to run some OTUs in SILVA classifier as well.
Why 97% is to strict for soil samples?
Because the vast majority of the jungle in there had never been isolated, this is terranova
I don't think 97% match is too strict for soil samples. OP is getting hits, it's just that they're to "uncultured bacteria". With RDP and SILVA at least, each included 16S sequence includes lineage information, e.g. "Bacteria: Proteobacteria: Gammaproteobacteria: Enterobacteriales: Enterobacteriaceae: Escherichia; uncultured bacteria". So all OP has to do is to look at information at genus or family level..
This kind of results only mean that someone sequenced this 16S, predicted its lineage and deposited it (or SILVA predicted the lineage of a deposited 16S sequence). Predicting lineage with matches less than 97% is basically the same.
This is the information I was looking for. Thanks.