What's your read length? Is it pair-end or single-end?
If your sequencing is pair-end, like 2*150, for those reads from DNA template shorter than 150, then you may find adapter bases from the tails of both read1 and read2.
How to find these adapters? There are 3 different overlapping patterns for different DNA template length (TLEN):
1, not overlapped TLEN > 2x150
ATCGATTTAGTTT...ATTAGGGATTA
------------------------------------------------------------TGTAATCGTAGT...AATACGATCGA
2, overlapped with 150 < TLEN < 2x150
ATCGATTTAGTTT...ATTAGGGATTA
-------------------...-----------AGGGATTACTATCT...AGATTC
3, overlapped with TLEN < 150
ATCGATTTAGTTT...ATTAGGGATTA-adapter
ATCGATTTAGTTT...ATTAGGGATTA-adapter
Try to search for pattern 3
, and count the adapter bases. It should be close to 0, if the data is really clean.
BTW, I also developed a tool AfterQC
(http://github.com/OpenGene/AfterQC, Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data) to automatically cut the adapters by utilizing the pattern 3
.
Thanks for your response! I apologize not all experiment design details were provided in initial post. So we have 250X250 MiSeq PE reads. As to TLEN in my experiment, it corresponds rather to pattern 2 than 3. Which tool i should use to search adapters in clean data? Is blastn short Ok for that?
You'd better program by yourself, it is not difficult.
You can take a look at code of
AfterQC
and may find your way.