Question

How To Explain Uneven Coverage Of A Dna Seqment Obtained Via Pcr Amplification.

4

Entering edit mode

11.0 years ago

rohan ▴ 110

Experiment: deep sequencing for mutants in 700nt fragment.

the fragment of dna was preamplified by primers flanking the fragment followed by hiseq.

per base coverage was calculated by coverageBed -d -abam in.bam -b ref.bed > out.cov

Observation: two distinct peaks in coverage at the ends as below plot.. coverage vs positions

enter image description here

the peaks are made from reads having part of primers..thus also show soft clipping at ends..

there is a huge difference in the calculations if i include such reads And if I exclude them.

Question: is there anyone who knows how to handle such a situation?

bedtools coverage • 5.8k views

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 11.0 years ago by rohan ▴ 110

1

Entering edit mode

can you make that region wider? what happens further out, plus also can you indicate the primer locations.

ADD REPLY • link 11.0 years ago by Istvan Albert 102k

0

Entering edit mode

shown above is the coverage of 700 bp region of my interest.. further out there is a steep decrease in coverage..

the primers were flanking the region ~10nts outside and ~10 nts inside the target region as shown below. enter image description here

ADD REPLY • link 11.0 years ago by rohan ▴ 110

0

Entering edit mode

is it possible that you are sequencing the primers there? Basically primer + illumina adaptor

ADD REPLY • link 11.0 years ago by Istvan Albert 102k

0

Entering edit mode

the target region was gel purified after pcr so this possibility is less likely.. i identified mutants in those reads.. so i think they are not coming from primers or adapters

ADD REPLY • link 11.0 years ago by rohan ▴ 110

2

Entering edit mode

It is very easy to check your data for this. Count how many reads are primers followed by the illumina adapter. You should remove these reads.

ADD REPLY • link 11.0 years ago by Istvan Albert 102k

Ram · Answer 1 · 2014-04-08

2

Entering edit mode

11.0 years ago

Andreas ★ 2.5k

Hi,

we see those peaks in ultra high coverage viral amplicon sequencing as well. You need to ignore the primer positions for a good number of reasons. One is, that you are interested in the amplified target region, but not the primer regions. The latter will by definition largely be identical with the used primers, but not necessarily with the target sequences (where the primers might have imperfectly bound at first). One can often detect false positive low frequency variants covering primer positions. Furthermore, the huge coverage bias might negatively affect downstream analysis (by the way: Picard's MarkDuplicates will likely not help here). The sharp coverage drop can be caused by your sequencing setup. For example, let's say this was a larger region and you fragmented before sequencing. While fragment ends would normally be equally distributed across the region, you will always see a fragment end at the primer start, which is where you then see the sharp drop/increase.

Just my two cents,

Andreas

ADD COMMENT • link 11.0 years ago by Andreas ★ 2.5k

0

Entering edit mode

thanks a lot for suggestion.. this is interesting.. so the shearing would make this uneven coverage at ends.. i am not a wet lab guy but i think ideally it should not.. because shearing is supposed to be random..

i am also suspecting whether there were any incomplete cycles in pre-amplification where this 200bp fragments could have made. but that is also not the case as the cycles were all with ample extension time (1 min for thermo phusion pol).

ADD REPLY • link 11.0 years ago by rohan ▴ 110

0

Entering edit mode

i found this discussion which is very similar to my case Samtools + Picard MarkDuplicates

which recommends removing the duplicates,but as you have quoted "by the way: Picard's MarkDuplicates will likely not help here " I wonder why that would happen?

ADD REPLY • link 11.0 years ago by rohan ▴ 110

0

Entering edit mode

If you have high coverage data then MarkDuplicates will likely remove pretty much all of your reads, because they all look identical. This can also skew SNV frequencies in downstream analysis.

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.8 years ago by Andreas ★ 2.5k

score 1 · Answer 2 · 2014-04-08

1

Entering edit mode

11.0 years ago

seidel 11k

there is a huge difference in the calculations...

The calculations of what? Are you trying to identify the mutants? Or quantify them? (different questions, unless you're trying to do both). I think Istvan is right (and as you describe), you have sequence from the primers, which are there at higher concentration (by sequence) than the insert fragment. If you know what they are, why not trim them off? I can't really see a reason not to.

ADD COMMENT • link 11.0 years ago by seidel 11k

0

Entering edit mode

by calculations, i meant the normalization of the frequency of mutants in the pool..

normalized frequency of mutant=absolute frequency / coverage at that position

this is where the coverage makes bias..

i will remove these reads, i will still get enough depth to work with.

thanks for the suggestion.

ADD REPLY • link 11.0 years ago by rohan ▴ 110

Ram · Answer 3 · 2014-06-20

two distinct peaks in coverage at the ends

That's totally normal. Every single molecule of your PCR product, after all, has a nice neat end right there already. You can order special PCR primers with "blockers" to curtail that behavior. You will also observe very few reads aligning quite close to the edge, apparently, the shearing happens very rarely, say, 20 bases away from the end of the ampilcon.

Remember that the sequence under the primers will literally be primer, if there is a mutation under the primer, you will never see it. So you don't need to worry about calling SNPs in those bases.