Hello!
I have a question pertaining to the trimming of custom/staggered adapter sequence in a FASTQ file.
I have a library with 4 different adapter lengths on the 5' side; this was done to increase library diversity during sequencing. However now I want to trim off the adapter and I can't do it by length!
Here's the 4 potential adapter sequence:
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCGGGGACTTATCAGCCAACC
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNCGGGGACTTATCAGCCAACC
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNCGGGGACTTATCAGCCAACC
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNCGGGGACTTATCAGCCAACC
And now I would like to trim off all "CGGGGACTTATCAGCCAACC" (and everything upstream!) from my reads - does anyone know how I can do it?
Thanks a lot for any help! Please let me know if further information is needed.
Joyce
Looks like a job for perl regex.
Thanks Karl. I have close to zero knowledge in programming, maybe a little bit in Python. Was hoping to be able to use some established programs such as Trimmomatic. I will keep looking around. Thanks! -joyce
I don't know enough perl to give you the answer. I suppose someone else will. But sometimes you have to do some programming to solve problems! Perl has a really easy string-search routine in the "regex" that can find your index and cut at the appropriate place. You could do it in Python if you'd like, or any kind of programming language.
Thanks Karl! I am in the process of learning Python - programming is absolutely crucial indeed!