Question

fastx toolkit problem: fastq to fasta

0

Entering edit mode

10.0 years ago

biolab ★ 1.4k

Hi everyone

I convert fastq to fasta using fastx tooklit using the following command: fastq_to_fasta -i in.fq -o out.fa

However, an error message pop up:

fastq_to_fasta: Invalid quality score value (char '+' ord 43 quality value -21) on line 12

Following is the first 16 lines of in.fq, what's wrong with line 12? Thank you very much!

@ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
AATAGTGGAGTGTATTTCACGTCATTTATCATTATCATTTAGTTCAGTTTTAATTTTATTTAGTTTTGTACAATTTCAATCAAAAACAGGAGTTCAGGGA
+ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
@?@DDDDFHHFD<FFHFEIHGIIGEHIEIIAHHCFHBGHH9DGG@CDDFGICBBFCGIGHGGIGIIIIHEFIGEGFHGGFHIEHICEEHHEEBBCECEED
@ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
GTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTCCGCCGTCAATTCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAA
+ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
BBBFFFFFHHFHHJJJJJJJJJJJJJJJIIJJIJJJFGGIIIIJIJJJJIJIJJHFFFDEEEEDDDDDDDDCDDD@BCBDBDDDDDD9@>BDCDDDDDDD
@ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
GAACCCATGAGGCACGCTGCGTGAGCCGCACCGCGCTGCTACTGGCGTTGGAGGAAGAGCTCCCAAGAGGCACCATCCGCTACTCCTCCAAGATCGTCTC
+ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
@@<DDDDDFHF?+<AE@GHGG@EGHBCF<D@77-;45@4?EAHEB;99?@?C;?BBA5<(5>@?9?A??B??AB<@?A@B>@BC>9@C??C@<AC?<A<<
@ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
AGGGGAGCCGGCGACCGAAGCCCCGGTGAACGGCGGCCGTAACAATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCCGCACGAAA
+ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
@@@DDDD:F@F:FGII)0-;FF@5AB'5?B;?<6;5B-707B@BB8333802?5>@B>BBBB<5;5>?B:44@4@49@B#####################

fastx • 9.5k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by biolab ★ 1.4k

0

Entering edit mode

Hi all,

I'm trying to use the command fastq_to_fasta on fastq files from MinION run (Nanopore technologies) because I need a fasta file to go on with data analysis. When I run the command fastq_to_fasta -i fileNanopore.fastq -o .fileNanopore.fasta I got this error: fastq_to_fasta: Error: invalid quality score data on line 2060 (quality_tok = "+"

I don't understand this error, could someone help me?

Thank you in advance,

Best regards
Sara

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 5.7 years ago by sara.dandreano • 0

2

Entering edit mode

Did you try adding -Q33 to your command line? Your nanopore data is in Sanger fastq format.

That said you should use reformat.sh or one of the newer tools for this.

ADD REPLY • link 5.7 years ago by GenoMax 147k

0

Entering edit mode

Dear genomax,

Thank you for your reply, I already tried the -Q33 as it was written in another question I saw on Biostars but it was not working. I will try the Phred+33 as h.mon suggests.

Thank you very much!!

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 5.7 years ago by sara.dandreano • 0

1

Entering edit mode

The FASTX-Toolkit is very old, and was developed back when Illumina used what is called Phred+64 quality encoding. Later, Illumina moved to the original Sanger Phred+33 encoding, and nowadays I believe every sequencing platform uses Phred+33. Hence fastx_to_fasta had Phred+64 as default, and you have to use the -Q 33 argument in case your file uses the Phred+33 encoding, as genomax pointed out. Read the fastq WikiPedia page for more information.

Be aware that the FASTX-Toolkit is really old and was designed with short reads in mind, it may or may not work for long NanoPore reads - be sure to double-check the integrity of the reads after the conversion.

ADD REPLY • link 5.7 years ago by h.mon 35k

0

Entering edit mode

Dear h.mon, Thank you for the suggestion, I will try with the Phred+33! best regards Sara

ADD REPLY • link 5.7 years ago by sara.dandreano • 0

1

Entering edit mode

Phread+33 is represented by -Q 33 option. Please follow our suggestions and use reformat.sh from BBMap suite.

reformat.sh in=your.fastq out=new.fa

ADD REPLY • link 5.7 years ago by GenoMax 147k

Ram · Answer 1 · 2014-11-18

Add -Q33 on command line. Fastx toolkit is assuming your fastq file in Phred+64 format, whereas your file has an offset of 33. So when it is subtracting 64 from 43 which is a corresponding decimal value for "+" ASCII character it is getting a negative value of -21 and therefore throwing a error. Read about different encodings here : http://en.wikipedia.org/wiki/FASTQ_format

Ram · Answer 2 · 2014-11-18

1

Entering edit mode

10.0 years ago

GouthamAtla 12k

Correct Usage:

fastq_to_fasta -Q33 -i in.fq -o out.fa

Simple linux commands would do that:

cat in.fq | awk '{ if (NR%4==1) print ">"$0 ; if (NR%4==2) print }' > out.fa

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you Ashutosh and Geek_y, your comments are really helpful. However, I have one further question: I tried two approaches to convert fastq to fasta, one is fastx -Q33 -i in.fq -o out.fa, and the other is sed -n '1~4s/^@/>/p; 2~4p' in.fq > out.fa. I found the number of lines for these two out.fa files differs. Where is the problem? THANKS!

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by biolab ★ 1.4k

0

Entering edit mode

I think if you give [-n] = keep sequences with unknown (N) nucleotides. option to fastq_to_fasta you will get the same number.

ADD REPLY • link 3.6 years ago by bjwiley23 ▴ 40

Ram · Answer 3 · 2014-11-19

0

Entering edit mode

10.0 years ago

Brian Bushnell 20k

I suggest you try my reformat tool, which is (as far as I know) the fastest converter, at over 500MB/s. It can handle various conversions (fastq, fasta+qual, fasta, sam, scarf, gzip, interleaved, dual-file, etc); it autodetects quality encoding, and can change between quality formats.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Brian Bushnell 20k