Tool:TGSFilter, An ultra-fast and efficient tool for long reads filtering and trimming
0
2
Entering edit mode
8 months ago
Huiyang ▴ 190

Dear researchers:

We have developed a software tool named TGSFilter for rapid filtering of third-generation sequencing long reads. This software is capable of filtering 200 Gb of HIFI reads or 160 Gb of ONT reads within 1 hour using 10 threads, and it generates compressed files. The memory requirement for this process is less than 200 MB.

(1) TGSFilter is able to automatically identify adapters, which can be achieved through alignment with a built-in general adapter library or assembly-based methods.

(2) Different parameters are utilized by TGSFilter to filter reads containing adapter sequences in the middle and at the end, aiming to effectively remove adapter sequences.

(3) TGSFilter automatically identifies unbalanced regions of nucleotide distribution at the 5' and 3' ends, and filters out the corresponding regions.

(4) Furthermore, TGSFilter automatically assesses the average quality value of input reads (fastq) and sets appropriate parameters to filter out low-quality reads.

Below is the help information for TGSFilter:

Usage: tgsfilter -i TGS.raw.fq.gz -o TGS.clean.fq.gz
 Input/Output options:
   -i   <str>   input of fasta/q file
   -o   <str>   output of fasta/q file
 Basic filter options:
   -l   <int>   min length of read to out [1000]
   -L   <int>   max length of read to out
   -q   <int>   min mean base quality [auto]
   -n   <int>   read number for base content check [200000]
   -e   <int>   read end length for base content check [100]
   -5   <int>   trim bases from the front (5') of the read [auto]
   -3   <int>   trim bases from the tail (3') of the read [auto]
 Adapter filter options:
   -a   <str>   adapter sequence file 
   -A           disable reads filter, only for adapter identify
   -N   <int>   read number for adapter identify [200000]
   -E   <int>   read end length for adapter identify [100]
   -k   <int>   kmer size for adapter assembly [19]
   -y   <int>   min assembly adapter length [30]
   -m   <int>   min match length between read and adapter [4]
   -M   <int>   min match length between read middle and adapter [35]
   -s  <float>  min similarity between read end and adapter [0.75]
   -S  <float>  min similarity between read middle and adapter [0.9]
 Other options:
   -t           number of threads [3]
   -h           show help [v1.08]

You can get the TGSFilter code and manual on github here

https://github.com/HuiyangYu/TGSFilter

If you could provide us with suggestions during the ongoing development and optimization of the software, we would be extremely grateful.

genome-assembly filtering reads trimming • 584 views
ADD COMMENT
0
Entering edit mode

Nice tools ! The software has simple operation commands and runs quickly.

ADD REPLY

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6