Spike-in control found in raw reads (16S amplicon seq) but not picked up by DADA2 - where to go from here?
1
0
Entering edit mode
6 months ago

I used the nf-core Ampliseq pipeline to analyze 16S metagenomic sequencing data. The lab spiked in a normal control, and it doesn't appear in the DADA2 results.

I blasted the raw reads against the reference genome that was spiked-in and confirmed that the 16S regions are there in the raw data (tons of hits).

I'm not sure where to go from here to figure out why they are not being picked up by DADA2. I checked Silva 138 database and there are ~700 genomes of that spike-in control indexed in the database.

Does anyone have suggestions as to what I can try to determine why the spike-in control isn't showing up in the ASV tables?

nf-core 16S amplicon dada2 ampliseq • 1.1k views
ADD COMMENT
0
Entering edit mode

Was the spike-in a commercial product, e.g., from Zymo? Can you provide more information about what cells or DNA was spiked into your samples? What do you mean by not picked up by DADA2 (those genera or species were not found among your annotated sequence features)? What hypervariable region did you sequence?

ADD REPLY
0
Entering edit mode

It's a commercial product, a Staphylococcus hominis genome. This species is not found in the DADA2 results, so I guess DADA2 is not identifying ASVs in the cleaned reads. There are ~700 S. hominis genomes in SILVA138. I also identified all the 16S regions from the reference S. hominis genome in my raw reads.

ADD REPLY
0
Entering edit mode
6 months ago
Chris Dean ▴ 420

If you sequenced a short hypervariable region, most of your sequence reads will not be able to be assigned to the species level. One possible reason for this is because you can find the same hypervariable region across multiple species within the same genus, so it's impossible to know what species you're looking at, e.g., S. aureus or S. chromogenes or S. hominis. Does this make sense?

ADD COMMENT
0
Entering edit mode

Ah, yes this makes sense! I knew this was an issue with, say, using kraken2 on shotgun metagenomic reads, but I don't have a lot of experience with 16S. There is one hit to Staphylococcus genus and it's S. aureus, but of course that's also part of normal flora so it's impossible to know what that hit is. The strange thing is that this is based off a published paper and in their github repository with the code, they've just searched for the hash corresponding to S. hominis and normalized by the read counts associated with that hash. I think this is why I was just assuming it's possible to find it. Do you have any thoughts on why the paper authors were able to find it? Did they just get lucky and it happened to map to S. hominis in their case? (Also, couldn't it also happen that DADA2 might map those amplicons to multiple closely-related species? So, I could have S. hominis and S. aureus identified in my DADA2 results but they actually come from one taxon and just weren't resolved correctly at the species level?)

ADD REPLY
1
Entering edit mode

It is difficult to answer how they were able to identify S. hominis without seeing the code or paper. To address your second question, it is possible to have multiple species represented by the same ASV. This could happen when two species within the same genus share the same hypervariable region that was sequenced. In other words, an ASV assigned to the Staphylococcus genus could actually represent one or more different Staphylococcus species, e.g., S. hominis and S. aureus.

ADD REPLY
0
Entering edit mode

Thanks again. Hmm...this makes me suspicious of whether this normalization is actually possible. In case you'd like to take a look, here is the paper, and this is their code I'm referencing. In the paper, you can search 'Staphylococcus hominis' to find the few sentences about how they did the spike-in and how they analyzed the result.

ADD REPLY
1
Entering edit mode

The md5 hash they refer to on line 22 (f46a7ca244afef522b22a11bd33d27b1) appears to map to a S. aureus strain, not S. hominis (you can confirm this by looking at the hash on line 1912 in taxaspec_md5_fasta_blast.csv).

They might have made a mistake in their script or uploaded the wrong version of it -- hard to say because I only spent a couple of minutes looking at it.

However, in my opinion, you wouldn't be able to pull this specific species out of the sample reliably because of the reason I mentioned in my previous post. I hope this helps. Good luck!

ADD REPLY
1
Entering edit mode

Thanks, Chris. I really appreciate your help. This has clarified a lot for me!

ADD REPLY

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6