Trimming SSR (repetitive primer) sequences
0
0
Entering edit mode
2.1 years ago
pixie@bioinfo ★ 1.5k

Hello, my lab is trying to work on SSR sequencing where we have designed specific SSR primers and we are trying to capture the regions between consecutive SSR primers. Until now, I was using exact match with "seqkit locate" option to exactly match the primer+anchor sequences. I have not yet done any QC on the demultiplexed data. So this is on the rawest sequence.

zcat 221027_MN01111_0087_A000H535FM.XXXX.R1.fastq.gz | seqkit locate -f pattern.fa >221027_MN01111_0087_A000H535FM_XXX_R1_locate.txt

However, I noticed that we pick up partial repeat primer sequences or even partial primer+ complete primer sequences at the beginning of the read (like a primer dimer). An example:

R1 of the fastq file

Here what I thought as the ISSR (region between two repeats) is actually another partial repeat primer from my list. How can I make the search pattern more flexible ? Any tools I could try ? Thanks

genomics • 542 views
ADD COMMENT
0
Entering edit mode

Please post some actual text format data. bbduk.sh would be another tool to try.

ADD REPLY

Login before adding your answer.

Traffic: 1757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6