I have a bunch of sequences obtained from a metagenomic experiment targeting the 16S gene of bacteria. The primers are designed to span from V3 to V4. The 16 rDNA gene is relatively well conserved, nonetheless I expect some variation in the size of this region. However, I'm seeing pretty large variations in the size of these fragments as you can see in the graph below. I would have thought that the distribution was going to be more compact. Where could I check for the sanity of this data ? I can't seem to find any sources on the amplitude of the naturally occurring length variation of the bacterial 16S gene.
EDIT - A few clarification to answer question from @Mabeuf:
- The source sample is a lake sediment core (highly phosphorus-saturated sediments) from which DNA was extracted.
- The primers used are 341F
5' -CCTACGGGNGGCWGCAG-3'
and 805R5' -GACTACHVGGGTATCTAATCC-3'
and was run with no other sample on an Illumina MiSeq. - Subtracting those two numbers 805 - 341 = 464 base pairs would fall right between the two right peaks on the graph.
I edited my question to provide more clarifications ! So you do think that the spread is too large to be of natural cause ?