Lack of early variants
1
0
Entering edit mode
7.5 years ago
L. A. Liggett ▴ 130

I am performing amplicon sequencing such that I have captured regions of the human genome that align to the exactly the same loci. And I am seeing a phenomenon that I had not expected, in that I see within a given amplicon region a couple mutations then a space where few if any mutations are found, and then a lot of mutations, as expected. I have included a hypothetical example that illustrates the point. My question is why I would see a gap towards the left of the plot which is earlier in the genome. Is this because early variants cause a strand to not align to reference or something like this?

I am using bwa mem and freebayes for alignment and variant calling.

See: https://goo.gl/photos/uRqUTMWR66fzbqRQA

sequencing • 2.1k views
ADD COMMENT
0
Entering edit mode

Another possibility would be that this region is extremely well conserved.

ADD REPLY
1
Entering edit mode

Without going into a lot of detail, this does not appear to be the case, especially because I see the same thing over multiple amplicons.

ADD REPLY
0
Entering edit mode

You should look at the alignment file as well and investigate those.

ADD REPLY
0
Entering edit mode

Sorry I'm not totally understanding you. Do you mean that within the alignment files I should check that the regions are well covered or something?

ADD REPLY
0
Entering edit mode

Yes, this relates to what WouterDeCoster said. If your region is well covered and all the reads align with no mismatches then there is the answer for the lack of mutations.

ADD REPLY
0
Entering edit mode

Right. So the tricky thing here is that I am using amplicon sequencing. So unlike WGS using sheared DNA, this is targeted amplification meaning that all reads should align perfectly. Therefore if a variant is identified in later regions of the DNA, earlier regions of the DNA must also have been covered. So, the coverage should be identical across a given locus.

Unless I'm missing something?

ADD REPLY
0
Entering edit mode

I don't get it why you would expect perfect alignments.

The DNA that gets sequenced is between the probes and will contain variations. Perhaps you mean that the start of the read ought to be a perfect match - it is unclear.

in general whether or not you are using WGS or targeted approaches is not relevant in my opinion.

You are aligning it against a reference genome that is different than the DNA that you amplified. You clearly have variation there no? And the goal is to find out what the differences are.

What I was suggesting above is that one needs to always assess the alignments as well. That's how we can tell what is going on with the data.

ADD REPLY
1
Entering edit mode
7.5 years ago

This result is expected if your sequence data overlaps the amplicon primer. Primer oligonucleotides contain errors; those containing mismatches toward the 5' end can still anneal and amplify, but mismatches toward the 3' end cannot.

ADD COMMENT
0
Entering edit mode

Hmm this sort of makes sense to me. But there will be an amplicon at both ends of the read and the reads will get sequenced from both ends. So would I not see the same lack of mutations pattern at each end?

ADD REPLY
0
Entering edit mode

It depends on how the libraries were constructed. Iff you have paired-end sequence data, and the second amplicon primer overlaps the sequence data in the same manner as primer one, and the melting temperature of primer two is the same as primer one, then you would expect to see the same phenomenon at the start of read two.

Edit: If the answer resolves your question, you should accept it so that future readers can assess the utility of the response.

ADD REPLY

Login before adding your answer.

Traffic: 4613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6