Trimming SSR (repetitive primer) sequences

0

Entering edit mode

2.1 years ago

pixie@bioinfo ★ 1.5k

Hello, my lab is trying to work on SSR sequencing where we have designed specific SSR primers and we are trying to capture the regions between consecutive SSR primers. Until now, I was using exact match with "seqkit locate" option to exactly match the primer+anchor sequences. I have not yet done any QC on the demultiplexed data. So this is on the rawest sequence.

zcat 221027_MN01111_0087_A000H535FM.XXXX.R1.fastq.gz | seqkit locate -f pattern.fa >221027_MN01111_0087_A000H535FM_XXX_R1_locate.txt

However, I noticed that we pick up partial repeat primer sequences or even partial primer+ complete primer sequences at the beginning of the read (like a primer dimer). An example:

R1 of the fastq file

Here what I thought as the ISSR (region between two repeats) is actually another partial repeat primer from my list. How can I make the search pattern more flexible ? Any tools I could try ? Thanks

genomics • 542 views

ADD COMMENT • link updated 2.1 years ago by GenoMax 148k • written 2.1 years ago by pixie@bioinfo ★ 1.5k

0

Entering edit mode

Please post some actual text format data. bbduk.sh would be another tool to try.

ADD REPLY • link 2.1 years ago by GenoMax 148k

Login before adding your answer.