filtering the reads based on the length
3
1
Entering edit mode
6.2 years ago
alireza346 ▴ 10

I have a fastq file (RNAseq) and filtered the linkers. now the sequences in the file have different length. I want to remove the reads with shorter than 21 nucleotide and use the rest of the reads. do you know any toll to do that?

RNA-Seq • 8.6k views
ADD COMMENT
0
Entering edit mode

How did you remove the adapters (linkers)? I hope you used an established tool like Cutadapt. These tools have in-built options to discard reads shorter a given threshold.

ADD REPLY
0
Entering edit mode

Hi,

You can use fastaparse.pl script available in mirdeep2 package.

ADD REPLY
0
Entering edit mode

Does that script work with fastq format files? OP is specifically asking about that format.

ADD REPLY
0
3
Entering edit mode
6.2 years ago

Try with seqkit:

seqkit seq -m 21 in.fq/in.fastq
ADD COMMENT
2
Entering edit mode
6.2 years ago
GenoMax 147k

Use the following tool from BBMap suite. reformat.sh in=your_fq.gz out=filt.fq.gz minlength=21. (Note: If you have paired-end data you will need to use in1= in2= and out1= out2=).

ADD COMMENT
1
Entering edit mode
18 months ago
geocarvalho ▴ 390

Another great option is fastp:

fastp --length_required 30 -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz

You may include --detect_adapter_for_pe if adapters are still there, --compression, --thread, and --html for a report.

ADD COMMENT

Login before adding your answer.

Traffic: 1818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6