Hi all,
I wanna to look for a pattern in the sequence, that would contain a conservative flanks and a wildcard piece inside of variable length.
In particular, I am checking the RADseq paired end data and looking for the short loci aiming to trim off the ligation_adapter from R1 and the cut_site_1-barcode-ligation_adapter from the R2.
Such reads look like this:
R1: cut_site_1-NNNNNNNNNN-cutsite_2-ligation_adapter
R2: cut_site_2-NNNNNNNNNN-cut_site_1-barcode-ligation_adapter
The problem is with trimming of R2: there is a conserved cut_site_1 & ligation_adapter sequences, but also there are 96 different types of barcodes, which sequence can be 4-8 bp long. I think I should use the wildcard, but how to specify a wildcard with varying length at the same time?
Glib
My