How to extract reads with exact match with BBDuk?
0
0
Entering edit mode
4.2 years ago
camelest ▴ 50

Hi, I'm pretty new to this area and is wondering anyone could help. I have a pair-ended RNA-seq data. What I'd like to do is to extract reads with a certain sequence in the 5' end of read2 from a fastq file, for example TTTTT. I've tried with BBDuk as below.

bbduk.sh -Xmx1g in1=read1.fastq in2=read2.fastq outm1=read1.filtered.fastq outm2=read2.filtered.fastq literal=TTTTT k=5 rcomp=f mm=f restrictleft=5

then I get read2.filtered.fastq mostly starting from TTTTT. However, when I do

cat read2.filtered.fastq | awk '(NR%4==2){print}' | grep -v "^TTTTT"

then there are many reads that do not start with TTTTT. For example,

ACTTACTGAGTTTTAAATGGTATAAATTTCTGGCATCTGGCAGGTG CTGCCACACACCCCAGTGGGCTTGGGGCCTGGCTGGAACTATTCAC GTCCCAAATATTCACTATGCCTTCTTTGGTGCCGGAAACTAACAGT TCCAAAACATTATCATTTCAATATGTAATCAACATAAAAAAATAAG

Why are these reads extracted? Where did I do wrong? Any suggestion would be really appreciated.

RNA-Seq fastq BBDuk • 1.5k views
ADD COMMENT
1
Entering edit mode

Maybe read 1 starts with TTTTT?

ADD REPLY
1
Entering edit mode

Sorry for the late reply. Thank you so much for your input. I found that you are totally correct. Is there any good ways in order to extract paired-ends with only read2 starting with TTTTT but not read1? I was thinking only use the read2 file as the input, do the same thing and use repair.sh in1=read1.fastq in2=read2.filtered.fastq out1=fixed1.fastq out2=fixed2.fastq outs=singletons.fq repair.

ADD REPLY
1
Entering edit mode

That would be the way to go.

ADD REPLY
0
Entering edit mode

Thank you for your comment. This was really helpful.

ADD REPLY
1
Entering edit mode

You should be able to use a pipe to do this in one step. Give it a try.

bbduk.sh -Xmx1g in=read2.fastq outm=stdout.fastq  literal=TTTTT k=5 rcomp=f mm=f restrictleft=5 | repair.sh in1=read1.fastq in2=stdin.fastq out1=fixed1.fastq out2=fixed2.fastq outs=singletons.fq repair
ADD REPLY

Login before adding your answer.

Traffic: 2835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6