Split Multimer Sequences at motif in FASTQ
1
Hi,
I have FASTQ file containing PCR multimeres and i need to split the sequences with a known primer sequence and kind of demultiplex them, but i want my primer to remain:
My input looks like this (primer bold)
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0
GGGTCAGTAGCGGAC GGGAACTGCATCACGCAATACGACTCACTATA GGGTCAGTAGCGGAC ......
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.......
output:
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0:1
GGGTCAGTAGCGGAC GGGAACTGCATCACGCAATACGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0:2
GGGTCAGTAGCGGAC ......
+
FFFFFFFFFFFFFFFFFFFFFF.......
Thanks a lot!
sequencing
next-gen
fastq
split
motif
• 919 views
•
link
updated 3.7 years ago by
GenoMax
147k
•
written 3.7 years ago by
schmau
▴
10
If I understand what you want then following should work. From BBMap suite .
$ more test.fq
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCATCACGCAATACGACTCACTATAGGGTCAGTAGCGGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1102:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAAACGTCGCACGCAATACGACTCACTATAGGGTCAGTAGCGGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1103:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCACGTCAGCTGGCGACTCACTATAGGGTCAGTAGCGGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Do the trimming.
$ bbduk.sh -Xmx2g in=test.fq outu=stdout.fq literal=GGGTCAGTAGCGGAC ktrim=r restrictright=20 -da k=7
Version 38.35
0.020 seconds.
Initial:
Memory: max=2147m, total=2147m, free=2129m, used=18m
Added 9 kmers; time: 0.005 seconds.
Memory: max=2147m, total=2147m, free=2126m, used=21m
Input is being processed as unpaired
Started output streams: 0.012 seconds.
@A00877:568:HVV57DSXY:4:1101:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCATCACGCAATACGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1102:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAAACGTCGCACGCAATACGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00877:568:HVV57DSXY:4:1103:27724:1408 1:N:0
GGGTCAGTAGCGGACGGGAACTGCACGTCAGCTGGCGACTCACTATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Processing time: 0.005 seconds.
Input: 3 reads 186 bases.
KTrimmed: 3 reads (100.00%) 45 bases (24.19%)
Total Removed: 0 reads (0.00%) 45 bases (24.19%)
Result: 3 reads (100.00%) 141 bases (75.81%)
Time: 0.025 seconds.
Reads Processed: 3 0.12k reads/sec
Bases Processed: 186 0.01m bases/sec
Change stdout.fq
to a file name to write the result out to a file instead of STDOUT. Use in1= in2= outu1= outu2=
if you have paired-end data.
Login before adding your answer.
Traffic: 1590 users visited in the last hour
with cutadapt:
input:
output:
updated to remove primer at the end:
-e 0
allowed errors zero as the third read differs by one base at 3' end.