While filtering illumina MiSeq data I want to check whether the forward primer matches the beginning of each read (these are single-end reads). Lots of programs let you do this, but then they trim off the primer sequence. I'd actually like to keep the primer sequences in, but throw out any reads that don't have the primer sequence.
*Edit: note that degenerate primers were used, so there are multiple sequences that could be matched.
Does anyone know of a way to do this with fastq files? Ideally I'd like to use Trimmomatic's ILLUMINACLIP command, but I haven't figured out a way to stop it from clipping the sequence.
The mothur command trim.seqs
does this for fasta files (when keepforward=T
), but I'd rather not have to waste time converting between fastq and fasta since this is a pipeline I will need to re-run a lot.
Thanks in advance!
Gavin
Why not use one of the barcode demultiplexing tools, with the degenerate primer sequences as your barcodes? You could
cat
all of the matched reads together at the end.Alternatively, I'm 95% certain that BBMap can do the trick. After all, it seems to do everything else ;-).
As a matter of fact, it can!
e.g. for the adapter sequence GGACTGANNCGA
Here,
literal
is the literal sequence to look for;k
should be set to the exact length of that sequence,restrictleft
is optional but in this case will tell it to only look for matches in the first 12 bases (so it will not accept something where the primer is in the middle, for example);rcomp=f
tells it to not look for reverse-complemented sequence; andcopyundefined
tells it to make copies of the literal sequence with all possible combinations of the degenerate bases.Thanks, this is perfect!
Aha, I knew it!