Question

Sequence parsing question

0

Entering edit mode

2.8 years ago

predeus ★ 2.1k

Hi all,

I was wondering if anybody knows a smart way (a command or a one-liner) to accomplish the following task.

There is a paried-end fastq file with lots of N's in it.

I need to get a subset of 25,000 matched paired-end reads with no Ns in it.

seqtk seq has a tool that does something similar, but it doesn't work with paired-end reads.

Would welcome any suggestions!

Thank you in advance.

seqtk fastq sequence parsing • 809 views

ADD COMMENT • link updated 2.8 years ago by cpad0112 21k • written 2.8 years ago by predeus ★ 2.1k

0

Entering edit mode

try --max-n 0 with cutadapt

ADD REPLY • link 2.8 years ago by cpad0112 21k

score 1 · Answer 1 · 2022-03-08

1

Entering edit mode

2.8 years ago

GenoMax 148k

Untested but should work using reformat.sh from BBMap suite.

reformat.sh -Xmx4g in1=R1.fq.gz in2=R2.fq.gz out=stdout.fq.gz maxns=0 | reformat.sh -Xmx4g in=stdin.fq.gz out1=Sampled.R1.fq.gz out2=Sampled.R2.fq.gz samplereadstarget=25000

If enough reads don't remain after initial filtering you could upsample=t to get 25K.