Sequence parsing question
1
0
Entering edit mode
2.8 years ago
predeus ★ 2.1k

Hi all,

I was wondering if anybody knows a smart way (a command or a one-liner) to accomplish the following task.

There is a paried-end fastq file with lots of N's in it.

I need to get a subset of 25,000 matched paired-end reads with no Ns in it.

seqtk seq has a tool that does something similar, but it doesn't work with paired-end reads.

Would welcome any suggestions!

Thank you in advance.

seqtk fastq sequence parsing • 809 views
ADD COMMENT
0
Entering edit mode

try --max-n 0 with cutadapt

ADD REPLY
1
Entering edit mode
2.8 years ago
GenoMax 148k

Untested but should work using reformat.sh from BBMap suite.

reformat.sh -Xmx4g in1=R1.fq.gz in2=R2.fq.gz out=stdout.fq.gz maxns=0 | reformat.sh -Xmx4g in=stdin.fq.gz out1=Sampled.R1.fq.gz out2=Sampled.R2.fq.gz samplereadstarget=25000

If enough reads don't remain after initial filtering you could upsample=t to get 25K.

ADD COMMENT
0
Entering edit mode

neat, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6