Hello,
Amplicon sequencing was performed across 9 samples and then I used DAD2 to analyze the variants. 83 Variants were found and below are my top 5:
1.AGTGCTTAAAGACTATTGTTTTATTCCCAAATTGTTCTCTTAATTTTATAACTATCTTATTTAAAGTGTCATTCCATTTTGCTCTACTAAGGTTACAATGTGCTTGTCTTATATCTCCTATTATTTCTCCTGTTGTATAAAATGCTCTGCCTGGTCCTATATTTATACTTTTT
2.ATTGCTTAAAGATTATTGTTTTATTATTTCCAAATTGTTCTCTTAATTTGCTAGCTATCTGTTTTAAAGTGGCATTCCATTTTGCTCTACTAATGTTACAATGTGCTTGTCTCATATTTCCTATTTTTCCTATTGTAACAAATGCTCTCCCTGGTCCCCTCTGGATACGGATACTTTTT
3.GTGCTTAAAGACTATTGTTTTATTCCCAAATTGTTCTCTTAATTTTATAACTATCTTATTTAAAGTGTATCTCCTATTATTTCTCCTGTTGTATAAAATGCTCTGCCTGGTCCTATATTTATACTTTTT
4.AGTGCCTGGTCCTATATTTATACTTTTT
5.AGTGCTTAAAGACTATTGTTTTATTAAAATGCTCTGCCTGGTCCTATATTTATACTTTTT
The first three are as expected but what are these very short sequences and what causes them? I imagine they are sequencing error but what exactly is happening here to get counts of many short sequences?
Thank You, Sara
I expect those short snippets really are present in your library, rather than just showing up due to your sequencing or dada2 usage. Take a look at how they compare in an MSA:
Whenever we do amplicon sequencing we get some fraction of reads that just match one, the other, or both primer sequences, and people have told me that's to be expected due to primer dimer and the like. When all works well it's just a small fraction, but when the starting material is degraded or low abundance we see a lot more of that. Could that explain what you're seeing? For example I notice sequence #4 is nearly identical to the end of most of the others (reverse primer?) and #5 looks like the start and end put together (primer dimer?).
Hey Saran
Some kind of quality filtering of the raw sequencing is necessary for DADA2 because low-quality reads will negativelly affect its accuracy to the estimates the error model. So, the actual number of ASV can be very different depending on how the raw sequencing data have been processed.
That said, I don't know why you get very short ASV but, dada2 community is very active, and you should probably ask there by explaining every step used to process the raw sequencing data and generate the ASVs.