Split Multimer Sequences at motif in FASTQ
1
Hi,
I have FASTQ file containing PCR multimeres and i need to split the sequences with a known primer sequence and kind of demultiplex them, but i want my primer to remain:
My input looks like this (primer bold)
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0
GGGTCAGTAGCGGAC GGGAACTGCATCACGCAATACGACTCACTATA GGGTCAGTAGCGGAC ......
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.......
output:
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0:1
GGGTCAGTAGCGGAC GGGAACTGCATCACGCAATACGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0:2
GGGTCAGTAGCGGAC ......
+
FFFFFFFFFFFFFFFFFFFFFF.......
Thanks a lot!
sequencing
next-gen
fastq
split
motif
• 951 views
If I understand what you want then following should work. From BBMap suite .
$ more test.fq
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCATCACGCAATACGACTCACTATAGGGTCAGTAGCGGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1102:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAAACGTCGCACGCAATACGACTCACTATAGGGTCAGTAGCGGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1103:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCACGTCAGCTGGCGACTCACTATAGGGTCAGTAGCGGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Do the trimming.
$ bbduk.sh -Xmx2g in= test.fq outu= stdout.fq literal= GGGTCAGTAGCGGAC ktrim= r restrictright= 20 -da k= 7
Version 38.35
0.020 seconds.
Initial:
Memory: max= 2147m, total= 2147m, free= 2129m, used= 18m
Added 9 kmers; time: 0.005 seconds.
Memory: max= 2147m, total= 2147m, free= 2126m, used= 21m
Input is being processed as unpaired
Started output streams: 0.012 seconds.
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCATCACGCAATACGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1102:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAAACGTCGCACGCAATACGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1103:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCACGTCAGCTGGCGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Processing time: 0.005 seconds.
Input: 3 reads 186 bases.
KTrimmed: 3 reads ( 100.00%) 45 bases ( 24.19%)
Total Removed: 0 reads ( 0.00%) 45 bases ( 24.19%)
Result: 3 reads ( 100.00%) 141 bases ( 75.81%)
Time: 0.025 seconds.
Reads Processed: 3 0.12k reads/sec
Bases Processed: 186 0.01m bases/sec
Change stdout.fq
to a file name to write the result out to a file instead of STDOUT. Use in1= in2= outu1= outu2=
if you have paired-end data.
Login before adding your answer.
Traffic: 3385 users visited in the last hour
with cutadapt:
input:
output:
updated to remove primer at the end:
-e 0
allowed errors zero as the third read differs by one base at 3' end.