Trimmomatic
1
0
Entering edit mode
10 months ago
Ashok • 0

I have 300 fastq files(paired end data).i did quality checking using fastqc.Now i have to do trimming.My required parametres are phred score 30,GC content should be (30-50%) and ambiguous bases should be less than 1%.If Nnn is in mid of sequence i have to omit that sequence.And I need to save the Trimmed fastq file in separate directory Anyone tell linux command for this operation.

Trimmomatic • 577 views
ADD COMMENT
1
Entering edit mode

what have you tried so far?

ADD REPLY
0
Entering edit mode
10 months ago

Trimmomatic doesn't remove ambiguous nucleotides from reads. There is nothing that comes to mind that would let you choose a threshold (< 1% ambiguous bases). However, if you have good quality reads, you can probably just delete sequences containing ambiguous bases, and see whether the size of your dataset decreases significantly.

For that purpose, seqkit is a good choice. For example, given the input file hasns.fsn,

>seq1
cccctttgannnnnnnccctt
>seq2
ccccatgtgttaaaatatgannnnnnnccctt
>seq3
cccctttgarrryyccctt
>seq4
tactcgacctatgtgttaaaatatgacctt
seqkit grep -s -r -p '[nry]' -i -v hasns.fsn

will output only sequence not matching the characters n, r or y.

>seq4
tactcgacctatgtgttaaaatatgacctt

(Seqkit will work on fasta or fastq files, but it was simpler to use a fasta file as an example.)

One potential problem is that you might delete one member of a sequence pair, which can cause some genome or transcriptome assembly programs to crash. It is probably a good idea, after running seqkit grep and Trimmomatic, to run fastq_pair to eliminate non-paired reads.

You may wish to try BIRCH, which bundles together hundreds of common bioinformatics programs, unified through BioLegato, a sophisticated graphic user interface. BIRCH includes Trimmomatic, seqkit, fastq_pair, along with a wide array or programs for automated processing of reads. At each step in read processing, the output goes to a new directory, which helps keep your many files organized. Examples can be seen in the BIRCH genome and transcriptome tutorials.

ADD COMMENT

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6