Question

remove empty reads from fastq files

2

Entering edit mode

4.6 years ago

dentepre ▴ 20

Hi,

Before perform STAR mapping I need to remove empty reads from my fastq files.

How can I do it without using bioawk (I cannot add it to the HPC). I have bedtools, samtools and bbmap, but I didn't find any solution in these packages.

Best, d.

RNA-Seq • 3.9k views

ADD COMMENT • link updated 2.3 years ago by GenoMax 153k • written 4.6 years ago by dentepre ▴ 20

0

Entering edit mode

Why can't you edit to your cluster? It is straight-forward to compile:

git clone https://github.com/lh3/bioawk.git
cd bioawk
make

or use a package manager like miniconda for it. You will have a hard time working on a HPC (or any machine) if you cannot add any software in the long-term.

ADD REPLY • link 4.6 years ago by ATpoint 89k

0

Entering edit mode

try cutadapt minimum length option or seqkit seq --min-len option

ADD REPLY • link 4.6 years ago by cpad0112 21k

score 5 · Answer 1 · 2021-01-02

5

Entering edit mode

4.6 years ago

GenoMax 153k

With BBMap suite:

reformat.sh in=your.fq.gz out=filtered.fq.gz minlength=N

Set N to a reasonable number (very short reads are not useful anyway). Set to 1 if you just want to remove empty reads.

ADD COMMENT • link 4.6 years ago by GenoMax 153k

0

Entering edit mode

how would the command look for two input files?

ADD REPLY • link 2.3 years ago by WAGNER • 0

0

Entering edit mode

I assume you mean a paired-end dataset. For single-end data you will simply need to run this twice for the two files.

For paired-end data use

reformat.sh in1=R1.fq.gz in2=R2.fq.gz out1=R1.filtered.fq.gz out2=R2.filtered.fq.gz  minlength=N

ADD REPLY • link 2.3 years ago by GenoMax 153k

score 4 · Answer 2 · 2021-01-02

4

Entering edit mode

4.6 years ago

Pierre Lindenbaum 166k

gunzip -c input.fq.gz | paste - - - - | awk -F '\t' '($2!="")' | tr "\t" "\n"

ADD COMMENT • link 4.6 years ago by Pierre Lindenbaum 166k