fastx toolkit problem: fastq to fasta
3
0
Entering edit mode
10.0 years ago
biolab ★ 1.4k

Hi everyone

I convert fastq to fasta using fastx tooklit using the following command: fastq_to_fasta -i in.fq -o out.fa

However, an error message pop up:

fastq_to_fasta: Invalid quality score value (char '+' ord 43 quality value -21) on line 12

Following is the first 16 lines of in.fq, what's wrong with line 12? Thank you very much!

@ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
AATAGTGGAGTGTATTTCACGTCATTTATCATTATCATTTAGTTCAGTTTTAATTTTATTTAGTTTTGTACAATTTCAATCAAAAACAGGAGTTCAGGGA
+ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
@?@DDDDFHHFD<FFHFEIHGIIGEHIEIIAHHCFHBGHH9DGG@CDDFGICBBFCGIGHGGIGIIIIHEFIGEGFHGGFHIEHICEEHHEEBBCECEED
@ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
GTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTCCGCCGTCAATTCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAA
+ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
BBBFFFFFHHFHHJJJJJJJJJJJJJJJIIJJIJJJFGGIIIIJIJJJJIJIJJHFFFDEEEEDDDDDDDDCDDD@BCBDBDDDDDD9@>BDCDDDDDDD
@ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
GAACCCATGAGGCACGCTGCGTGAGCCGCACCGCGCTGCTACTGGCGTTGGAGGAAGAGCTCCCAAGAGGCACCATCCGCTACTCCTCCAAGATCGTCTC
+ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
@@<DDDDDFHF?+<AE@GHGG@EGHBCF<D@77-;45@4?EAHEB;99?@?C;?BBA5<(5>@?9?A??B??AB<@?A@B>@BC>9@C??C@<AC?<A<<
@ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
AGGGGAGCCGGCGACCGAAGCCCCGGTGAACGGCGGCCGTAACAATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCCGCACGAAA
+ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
@@@DDDD:F@F:FGII)0-;FF@5AB'5?B;?<6;5B-707B@BB8333802?5>@B>BBBB<5;5>?B:44@4@49@B#####################
fastx • 9.5k views
ADD COMMENT
0
Entering edit mode

Hi all,

I'm trying to use the command fastq_to_fasta on fastq files from MinION run (Nanopore technologies) because I need a fasta file to go on with data analysis. When I run the command fastq_to_fasta -i fileNanopore.fastq -o .fileNanopore.fasta I got this error: fastq_to_fasta: Error: invalid quality score data on line 2060 (quality_tok = "+"

I don't understand this error, could someone help me?

Thank you in advance,

Best regards
Sara

ADD REPLY
2
Entering edit mode

Did you try adding -Q33 to your command line? Your nanopore data is in Sanger fastq format.

That said you should use reformat.sh or one of the newer tools for this.

ADD REPLY
0
Entering edit mode

Dear genomax,

Thank you for your reply, I already tried the -Q33 as it was written in another question I saw on Biostars but it was not working. I will try the Phred+33 as h.mon suggests.

Thank you very much!!

ADD REPLY
1
Entering edit mode

The FASTX-Toolkit is very old, and was developed back when Illumina used what is called Phred+64 quality encoding. Later, Illumina moved to the original Sanger Phred+33 encoding, and nowadays I believe every sequencing platform uses Phred+33. Hence fastx_to_fasta had Phred+64 as default, and you have to use the -Q 33 argument in case your file uses the Phred+33 encoding, as genomax pointed out. Read the fastq WikiPedia page for more information.

Be aware that the FASTX-Toolkit is really old and was designed with short reads in mind, it may or may not work for long NanoPore reads - be sure to double-check the integrity of the reads after the conversion.

ADD REPLY
0
Entering edit mode

Dear h.mon, Thank you for the suggestion, I will try with the Phred+33! best regards Sara

ADD REPLY
1
Entering edit mode

Phread+33 is represented by -Q 33 option. Please follow our suggestions and use reformat.sh from BBMap suite.

reformat.sh in=your.fastq out=new.fa
ADD REPLY
2
Entering edit mode
10.0 years ago

Add -Q33 on command line. Fastx toolkit is assuming your fastq file in Phred+64 format, whereas your file has an offset of 33. So when it is subtracting 64 from 43 which is a corresponding decimal value for "+" ASCII character it is getting a negative value of -21 and therefore throwing a error. Read about different encodings here : http://en.wikipedia.org/wiki/FASTQ_format

ADD COMMENT
1
Entering edit mode
10.0 years ago

Correct Usage:

fastq_to_fasta -Q33 -i in.fq -o out.fa

Simple linux commands would do that:

cat in.fq | awk '{ if (NR%4==1) print ">"$0 ; if (NR%4==2) print }' > out.fa
ADD COMMENT
0
Entering edit mode

Thank you Ashutosh and Geek_y, your comments are really helpful. However, I have one further question: I tried two approaches to convert fastq to fasta, one is fastx -Q33 -i in.fq -o out.fa, and the other is sed -n '1~4s/^@/>/p; 2~4p' in.fq > out.fa. I found the number of lines for these two out.fa files differs. Where is the problem? THANKS!

ADD REPLY
0
Entering edit mode

I think if you give [-n] = keep sequences with unknown (N) nucleotides. option to fastq_to_fasta you will get the same number.

ADD REPLY
0
Entering edit mode
10.0 years ago

I suggest you try my reformat tool, which is (as far as I know) the fastest converter, at over 500MB/s. It can handle various conversions (fastq, fasta+qual, fasta, sam, scarf, gzip, interleaved, dual-file, etc); it autodetects quality encoding, and can change between quality formats.

ADD COMMENT

Login before adding your answer.

Traffic: 1599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6