Entering edit mode
6.7 years ago
Sureshkumar V.
•
0
I am trying to filter reads(Illumina data - RNA-Seq) based on the quality of 30 using the fastx toolkit. But I got an error like this. fastq_quality_filter: found invalid nucleotide sequence (GCGGAGWAACCGTTCGGCEACCAGGTGGCATCGCCGCCGAGGGWGCTCCCGTGGCGCGGGCAGTCGTTGACGAACATCTC) on line 85766.
How to resolve this error?
Thanks in advance.
Sequence contains 'W' (and maybe other) character(s) that might be causing the error. You can try other tools such as fastqc.
'W' should be an allowed character, since it encodes for the weak bases (A or T). I've never seen an 'E'; this might be the problem.
There are many recent alternatives to the quite old fastx tool-kit. Just to name a few: bbduk, trimgalore, or trimmomatic.
How did you get ambiguous codes in your raw RNAseq data? What technology is this data from and has it been pre-processed in some fashion?