Question

Fastq_quality_filter: found invalid nucleotide sequence

0

Entering edit mode

6.8 years ago

Sureshkumar V. • 0

I am trying to filter reads(Illumina data - RNA-Seq) based on the quality of 30 using the fastx toolkit. But I got an error like this. fastq_quality_filter: found invalid nucleotide sequence (GCGGAGWAACCGTTCGGCEACCAGGTGGCATCGCCGCCGAGGGWGCTCCCGTGGCGCGGGCAGTCGTTGACGAACATCTC) on line 85766.

How to resolve this error?

Thanks in advance.

RNA-Seq Assembly ngs • 2.3k views

ADD COMMENT • link updated 6.8 years ago by Sej Modha 5.3k • written 6.8 years ago by Sureshkumar V. • 0

0

Entering edit mode

Sequence contains 'W' (and maybe other) character(s) that might be causing the error. You can try other tools such as fastqc.

ADD REPLY • link 6.8 years ago by Sej Modha 5.3k

1

Entering edit mode

'W' should be an allowed character, since it encodes for the weak bases (A or T). I've never seen an 'E'; this might be the problem.

There are many recent alternatives to the quite old fastx tool-kit. Just to name a few: bbduk, trimgalore, or trimmomatic.

ADD REPLY • link 6.8 years ago by michael.ante ★ 3.9k

0

Entering edit mode

How did you get ambiguous codes in your raw RNAseq data? What technology is this data from and has it been pre-processed in some fashion?

ADD REPLY • link 6.8 years ago by GenoMax 148k

score 0 · Answer 1 · 2018-03-12

0

Entering edit mode

6.8 years ago

egeulgen ★ 1.3k

Your sequence seems to contain ambiguity codes. Simply remove those and you should be fine

ADD COMMENT • link 6.8 years ago by egeulgen ★ 1.3k

0

Entering edit mode

You can have a look at the ambiguity codes here

ADD REPLY • link 6.8 years ago by NB ▴ 960

0

Entering edit mode

Ok, egeulgen, I will try and let you know.

ADD REPLY • link 6.8 years ago by Sureshkumar V. • 0