Dear all,
I have a fastq formatted file and I want to remove reads that DO NOT have a specific sequence in them. The experiment is designed in a way that only reads that have an adapter in them are meaningful for us and the rest of the reads are trash and I have to find a way to get rid of them. Another layer of complication is that I have paired-end reads. I would be grateful if someone can make a suggestion about how to handle this.
Best regards
You can look into tools that splits the fastq file based on barcode and check if you can adapt them.
For example, quiime or stacks
If you know the exact sequence, you can do it with fastq-grep but it does not allow mismatches.
And more easy way given below as an answer.
Grep out the forward and reverse adapter sequence you are looking for and one line above and two lines below. Then append the two files together
grep -A 1 -B 2 "adapter sequence" original.fastq > wanted_adapters.fastq grep -A 1 -B 2 "reverse_adapter sequence" original.fastq >> wanted_adapters.fastq
using seqkit:
v = inverse match, i=ignore case, s=sequence, p=pattern, d=degenerate