Is there any tool that can search for a pattern (while allowing mismatch) in fastq file?
2
0
Entering edit mode
4 months ago
Joshua ▴ 20

Hi,

I want to compare the accuracy of different trimming tools. I want to search for adapter sequence in the trimmed fastq file. I tried using grep command as mentioned below, but it's looking only for exact match. When I searched with part of the adapter sequence, I can see more results, and all the matches are near the sequence end, hence, I think they are most likely from adapter sequence.

$ zcat sample1_tool1_trimmed.fastq | grep "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" | wc -l
$ 28
$ zcat sample1_tool1_trimmed.fastq | grep "AGATCGGAAGAGCA" | wc -l
$ 143

Is there any tool that can search for adapter sequence in fastq file while allowing mismatch? Also, this is one time analysis, hence, it's completely fine if the tool takes more time to run.

And, thank you for your suggestion in advance. :)

next-generation-sequencing trimming-tools • 465 views
ADD COMMENT
0
Entering edit mode
4 months ago
dsull ★ 6.9k

There are plenty. Check out agrep.

Also, not sure what your goal is. What you're doing is trying to find adapters in your sequencing reads. That's exactly what adapter trimming tools like cutadapt do. Kind of a strange way to do benchmarking by writing a poor man's version of cutadapt (which will certainly have false negatives since you're not doing an alignment strategy). Benchmarking should be done against some sort of ground truth, which you can generate by simulation.

ADD COMMENT
0
Entering edit mode
4 months ago
xiaoguang ▴ 160

you can try seqkit, bbmap

ADD COMMENT

Login before adding your answer.

Traffic: 1670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6