remove empty reads from fastq files
2
2
Entering edit mode
3.9 years ago
dentepre ▴ 20

Hi,

Before perform STAR mapping I need to remove empty reads from my fastq files.

How can I do it without using bioawk (I cannot add it to the HPC). I have bedtools, samtools and bbmap, but I didn't find any solution in these packages.

Best, d.

RNA-Seq • 3.4k views
ADD COMMENT
0
Entering edit mode

Why can't you edit to your cluster? It is straight-forward to compile:

git clone https://github.com/lh3/bioawk.git
cd bioawk
make

or use a package manager like miniconda for it. You will have a hard time working on a HPC (or any machine) if you cannot add any software in the long-term.

ADD REPLY
0
Entering edit mode

try cutadapt minimum length option or seqkit seq --min-len option

ADD REPLY
5
Entering edit mode
3.9 years ago
GenoMax 147k

With BBMap suite:

reformat.sh in=your.fq.gz out=filtered.fq.gz minlength=N

Set N to a reasonable number (very short reads are not useful anyway). Set to 1 if you just want to remove empty reads.

ADD COMMENT
0
Entering edit mode

how would the command look for two input files?

ADD REPLY
0
Entering edit mode

I assume you mean a paired-end dataset. For single-end data you will simply need to run this twice for the two files.

For paired-end data use

reformat.sh in1=R1.fq.gz in2=R2.fq.gz out1=R1.filtered.fq.gz out2=R2.filtered.fq.gz  minlength=N
ADD REPLY
3
Entering edit mode
3.9 years ago
gunzip -c input.fq.gz | paste - - - - | awk -F '\t' '($2!="")' | tr "\t" "\n"
ADD COMMENT

Login before adding your answer.

Traffic: 1627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6