Trimmomatic doesn't remove ambiguous nucleotides from reads. There is nothing that comes to mind that would let you choose a threshold (< 1% ambiguous bases). However, if you have good quality reads, you can probably just delete sequences containing ambiguous bases, and see whether the size of your dataset decreases significantly.
For that purpose, seqkit is a good choice. For example, given the input file hasns.fsn
,
>seq1
cccctttgannnnnnnccctt
>seq2
ccccatgtgttaaaatatgannnnnnnccctt
>seq3
cccctttgarrryyccctt
>seq4
tactcgacctatgtgttaaaatatgacctt
seqkit grep -s -r -p '[nry]' -i -v hasns.fsn
will output only sequence not matching the characters n, r or y.
>seq4
tactcgacctatgtgttaaaatatgacctt
(Seqkit will work on fasta or fastq files, but it was simpler to use a fasta file as an example.)
One potential problem is that you might delete one member of a sequence pair, which can cause some genome or transcriptome assembly programs to crash. It is probably a good idea, after running seqkit grep and Trimmomatic, to run fastq_pair to eliminate non-paired reads.
You may wish to try BIRCH, which bundles together hundreds of common bioinformatics programs, unified through BioLegato, a sophisticated graphic user interface. BIRCH includes Trimmomatic, seqkit, fastq_pair, along with a wide array or programs for automated processing of reads. At each step in read processing, the output goes to a new directory, which helps keep your many files organized. Examples can be seen in the BIRCH genome and transcriptome tutorials.
what have you tried so far?