Dear researchers:
We have developed a software tool named TGSFilter for rapid filtering of third-generation sequencing long reads. This software is capable of filtering 200 Gb of HIFI reads or 160 Gb of ONT reads within 1 hour using 10 threads, and it generates compressed files. The memory requirement for this process is less than 200 MB.
(1) TGSFilter is able to automatically identify adapters, which can be achieved through alignment with a built-in general adapter library or assembly-based methods.
(2) Different parameters are utilized by TGSFilter to filter reads containing adapter sequences in the middle and at the end, aiming to effectively remove adapter sequences.
(3) TGSFilter automatically identifies unbalanced regions of nucleotide distribution at the 5' and 3' ends, and filters out the corresponding regions.
(4) Furthermore, TGSFilter automatically assesses the average quality value of input reads (fastq) and sets appropriate parameters to filter out low-quality reads.
Below is the help information for TGSFilter:
Usage: tgsfilter -i TGS.raw.fq.gz -o TGS.clean.fq.gz
Input/Output options:
-i <str> input of fasta/q file
-o <str> output of fasta/q file
Basic filter options:
-l <int> min length of read to out [1000]
-L <int> max length of read to out
-q <int> min mean base quality [auto]
-n <int> read number for base content check [200000]
-e <int> read end length for base content check [100]
-5 <int> trim bases from the front (5') of the read [auto]
-3 <int> trim bases from the tail (3') of the read [auto]
Adapter filter options:
-a <str> adapter sequence file
-A disable reads filter, only for adapter identify
-N <int> read number for adapter identify [200000]
-E <int> read end length for adapter identify [100]
-k <int> kmer size for adapter assembly [19]
-y <int> min assembly adapter length [30]
-m <int> min match length between read and adapter [4]
-M <int> min match length between read middle and adapter [35]
-s <float> min similarity between read end and adapter [0.75]
-S <float> min similarity between read middle and adapter [0.9]
Other options:
-t number of threads [3]
-h show help [v1.08]
You can get the TGSFilter code and manual on github here
https://github.com/HuiyangYu/TGSFilter
If you could provide us with suggestions during the ongoing development and optimization of the software, we would be extremely grateful.
Nice tools ! The software has simple operation commands and runs quickly.