Question

Trimmomatic

0

Entering edit mode

10 months ago

Ashok • 0

I have 300 fastq files(paired end data).i did quality checking using fastqc.Now i have to do trimming.My required parametres are phred score 30,GC content should be (30-50%) and ambiguous bases should be less than 1%.If Nnn is in mid of sequence i have to omit that sequence.And I need to save the Trimmed fastq file in separate directory Anyone tell linux command for this operation.

Trimmomatic • 577 views

ADD COMMENT • link updated 10 months ago by Ram 44k • written 10 months ago by Ashok • 0

1

Entering edit mode

what have you tried so far?

ADD REPLY • link 10 months ago by Pierre Lindenbaum 164k

Ram · Answer 1 · 2024-01-27

Trimmomatic doesn't remove ambiguous nucleotides from reads. There is nothing that comes to mind that would let you choose a threshold (< 1% ambiguous bases). However, if you have good quality reads, you can probably just delete sequences containing ambiguous bases, and see whether the size of your dataset decreases significantly.

For that purpose, seqkit is a good choice. For example, given the input file hasns.fsn,

>seq1
cccctttgannnnnnnccctt
>seq2
ccccatgtgttaaaatatgannnnnnnccctt
>seq3
cccctttgarrryyccctt
>seq4
tactcgacctatgtgttaaaatatgacctt

seqkit grep -s -r -p '[nry]' -i -v hasns.fsn

will output only sequence not matching the characters n, r or y.

>seq4
tactcgacctatgtgttaaaatatgacctt

(Seqkit will work on fasta or fastq files, but it was simpler to use a fasta file as an example.)

One potential problem is that you might delete one member of a sequence pair, which can cause some genome or transcriptome assembly programs to crash. It is probably a good idea, after running seqkit grep and Trimmomatic, to run fastq_pair to eliminate non-paired reads.

You may wish to try BIRCH, which bundles together hundreds of common bioinformatics programs, unified through BioLegato, a sophisticated graphic user interface. BIRCH includes Trimmomatic, seqkit, fastq_pair, along with a wide array or programs for automated processing of reads. At each step in read processing, the output goes to a new directory, which helps keep your many files organized. Examples can be seen in the BIRCH genome and transcriptome tutorials.