Experiment: deep sequencing for mutants in 700nt fragment.
the fragment of dna was preamplified by primers flanking the fragment followed by hiseq.
per base coverage was calculated by coverageBed -d -abam in.bam -b ref.bed > out.cov
Observation: two distinct peaks in coverage at the ends as below plot.. coverage vs positions
the peaks are made from reads having part of primers..thus also show soft clipping at ends..
there is a huge difference in the calculations if i include such reads And if I exclude them.
Question: is there anyone who knows how to handle such a situation?
can you make that region wider? what happens further out, plus also can you indicate the primer locations.
shown above is the coverage of 700 bp region of my interest.. further out there is a steep decrease in coverage..
the primers were flanking the region ~10nts outside and ~10 nts inside the target region as shown below.
is it possible that you are sequencing the primers there? Basically primer + illumina adaptor
the target region was gel purified after pcr so this possibility is less likely.. i identified mutants in those reads.. so i think they are not coming from primers or adapters
It is very easy to check your data for this. Count how many reads are primers followed by the illumina adapter. You should remove these reads.