I have a lot of sequences in a file that are all 8bp long. I only want the sequences that resemble GTNNNNTG. I thought about using cutadapt to filter all sequences with the linked adapter option:
cutadapt -a ^GT...TG --discard-untrimmed
However this removes all the GT and TG bases from each end but I want to keep them on the reads so I end up with a file with 8bp long sequences that all match GTNNNNTG.
Anyone have any other ideas how I could do this?
Yes this does the job! Thanks so much