I have two FASTQ data sets/files (read.R1.fastq and read.R2.fastq) generated from paired-end read sequencing. One file (read.R1.fastq) contains 18 nucleotide long linker/tag. How can I extract the reads containing the linker (allowing 3/4 mutations inside it) from read.R1.fastq and its corresponding reads from read.R2.fastq and save the extracted reads into two separate files? Is it possible to prepare a single file after extraction which will contain the full-length sequence of reads and their information (such as ID, quality score, etc.)?
Thanks in advance.
Use cutadapt for this. You can use max error rate or write a regex with known positions of variation.
If you are looking for merging reads from R1 and R2 retaining quality scores etc, try pandaseq. If you are looking for interleaving, try bbmap