Hi
I have a somewhat difficult problem to find a solution to on google. I need to trim my fastq files up until a sequence, and not remove that sequence (but remove everything before it).
This
someRandomNoise_aKnownSequence_unknownSequence
becomes
aKnownSequence_unknownSequence
All the tools I use, and that I have seen, would remove both the "someRandomNoise" and the "aKnownSequence"
I could try to find the location of the sequence in each read and then trim then in a loop, but this seem very inefficient.
To verify your trimming results, you might like to clone our visualisation tool Trimviz, see example report here (currently in beta testing). I apologize for the shameless plug, but it's exactly this kind of non-standard trimming situation for which I envisaged it would be useful. Dependencies include a few common R and python libs, plus samtools (and ideally seqtk). In FQ mode, give it the pre-trimmed and post-trimmed fastq file names (
python path/to/trimviz.py FQ -u <untrimmed.fq.gz> -t <trimmed.fq.gz> -o <outdir>
, and use-k 50000
if you don't have seqtk installed or are in a hurry). I imagine you would see a big block of vertical stripes around the 5' trimming site in the sequence heat-maps, corresponding to the desired target sequence. If it is on the RIGHT of the 5'-trimming site, then that sequence has been successfully retained in your reads but everything before it is trimmed.