Question

Why Does Mothur Require An Alignment Step Before Clustering?

4

Entering edit mode

11.3 years ago

vijay ★ 1.6k

Dear all,

I am trying to cluster my sequence dataset so that I can determine the number of OTU's at various percent identities. I have tried with QIIME and it worked out well. However I am trying to use the mothur workflow to perform clustering of my sequence dataset. I am able to see that mothur requires alignment of the dataset to be done before we cluster them.

Is there a way we can skip the alignment step in mothur and cluster the sequences like how QIIME does or is this a oversight from my side??

clustering • 8.2k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 11.3 years ago by vijay ★ 1.6k

4

Entering edit mode

You might want to check out my ISME J paper on why an alignment (to a reference) is important for analyzing 16S rRNA gene sequences (http://www.ncbi.nlm.nih.gov/pubmed/23018771). So in short, the requirement for alignment is a feature, not a bug. Now if you have ITS sequences, that's another ball of wax and I frankly wonder how those folks can align their sequences to anything or even use OTUs to do community ecology analyses.

ADD REPLY • link 10.8 years ago by pschloss ▴ 300

0

Entering edit mode

I am having the same problem.

I am using ITS on fungi, I find Mothur useful and it serves as a great tool to use instead of or with qiime.

However, there is HUGE world of research using the ITS regions for fungi, We don't align, and we get great and useful data. Not recognizing this, and ignoring the fungal world is, in my opinion, significantly restricting mothur's utility to the microbial community.

Can we use mothur to cluster, without requiring alignments or not?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by rosenstockn • 0

Ram · Answer 1 · 2013-09-10

4

Entering edit mode

11.3 years ago

Istvan Albert 102k

The uclust method in QIIME (probably this is what you are using) also works by using alignments - it just that the process is internal to the tool and is not visible.

In mothur the steps are more explicitly laid out.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 11.3 years ago by Istvan Albert 102k

0

Entering edit mode

Thanks for your response Istvan. I am able to understand that. But my point is , the number of OTU's that are generated vary between QIIME and MOTHUR for the same dataset at 97%. So I am skeptic if the alignment against a silva reference that is done in mothur causes this change.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 11.3 years ago by vijay ★ 1.6k

2

Entering edit mode

OK, but that is a different question altogether -

Of course that the number of OTU's will vary, the methods are very different - just because both use alignments it does not make them identical, they align against different targets. UCLUST aligns sequences to one another with a blast like algorithm and clusters by centroid. Mothur aligns the sequences against a standardized reference via the NAST algorithm then clusters by that similarity.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 11.3 years ago by Istvan Albert 102k

0

Entering edit mode

Did you perform a chimera filtering with both pipelines? It might explain at least some of the differences.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 11.3 years ago by Manu Prestat 4.1k

Ram · Answer 2 · 2013-09-10

Just to reiterate what Istvan Albert wrote in his answer, you use alignments to infer identical species in your mixed amplicon sequencing experiment. You run both quality control (read quality) and chimera checks in both pipelines -- and these will not perform equally since both QIIME and MOTHUR use different QC algorithms. In addition, you may or may not remove exact matching redundant sequences based on your pipeline.

If you construct your sequence alignment at this step you will have different sequences (in QIIME vs. MOTHUR) with which to construct your alignment matrix. You perform an alignment because you want to infer homology and find polymorphisms in your amplicon sequences -- if you've ever inspected the output from an alignment program you'll see that they never construct the exact same alignment (or rarely do they, in my opinion) when run on the same dataset.

All these factors contribute to why it's difficult to compare OTU numbers derived from both QIIME and MOTHUR, even running in the same pipeline at different times.

Also, both MOTHUR and QIIME construct alignments for that particular aspect of the analysis, so if you want to estimate OTUs you will have to run an alignment. Usually this is not a problem except if you have errant sequences which are not homologous to your amplicon of interest.