Identifying shared exact sequence variants between two datasets
2
0
Entering edit mode
5.9 years ago

I have two microbiome datasets which are sequenced from different sites within the environment. I would to like to identify amplicon sequence variants inferred using DADA2 which are shared between the two datasets and also those which are unique. The problem is i'm not 100% sure of the best way to do this. So far I have tried megablast and just looking at exact matches between the two datasets but perhaps there is a better way of doing this, for example another alignment tool or alignment free method but unfortunately I don't have the experience to be able to decide on the best method. Does anyone with more experience of working with microbiome data know of a better way?

alignment dada2 16s microbiome NGS • 1.7k views
ADD COMMENT
0
Entering edit mode

"amplicon sequence variants " - you mean the common and different variants between two datasets ? Does your organism has reference genome available ?

ADD REPLY
0
Entering edit mode

The term is just what dada2 use to refer to each individual 16S rRNA sequence representing a "species" of bacteria, they are not variants in the genetics sense. Essentially yes, I wish to find the common and differing sequences between the two datasets.

ADD REPLY
2
Entering edit mode
5.9 years ago
h.mon 35k

You can use cd-hit-est-2d for this task. The VSEARCH wiki also has an example on how to use it to (among other things) compare datasets.

ADD COMMENT
0
Entering edit mode
5.9 years ago
Charles Yin ▴ 180

You may use sliding window approach to align windowed sequence from one genome onto the other genome.

ADD COMMENT
1
Entering edit mode

OP data is amplicon, probably 16S.

ADD REPLY

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6