Convert Illumina Reads To Sanger Score
6
4
Entering edit mode
13.6 years ago

What tool do you use to convert your Illumina paired-end reads (Illumina's fastq is encoded in ASCII-64) to Sanger score (ASCII-33) ?

I am looking at two methods included in maq (both written by lh3): Do you use one of this methods associated with maq or recommend any other tools.

illumina short next-gen sequencing • 16k views
ADD COMMENT
3
Entering edit mode

fq_all2std.pl is outdated...

ADD REPLY
0
Entering edit mode

What does it mean? The script is not suitable?

ADD REPLY
3
Entering edit mode

I think converting to the sanger scale should be the first step.

ADD REPLY
0
Entering edit mode

@lh3: Thanks for the info. I am wondering whether I should do the data-cleanup before / after converting the illumina to sanger score. Please let me know your thoughts.

ADD REPLY
0
Entering edit mode

Thanks @lh3 !!

ADD REPLY
0
Entering edit mode

I was wondering if you run a bunch of mixed formatted (some Illumina and some Sanger) using maq or seqret to create them all in a fixed ASCII-33 format, will these two tools skip the files that are already in Sanger (ASCII-33) format and convert only ASCII-64 files to ASCII-33?

ADD REPLY
9
Entering edit mode
13.5 years ago

We use emboss seqret:


$EMBOSS_HOME/seqret fastq-illumina::phred64Data.fastq fastq::phred33Data.fastq
ADD COMMENT
0
Entering edit mode

Is there documentation for which Fastq dialects seqret support and how to reference those dialects with a seqret command? I'm not seeing it in the EMBOSS documentation.

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
13.6 years ago
Farhat ★ 2.9k

Galaxy's FASTQ groomer will do this job if you don't mind the web interface.

ADD COMMENT
0
Entering edit mode

Thanks Farhat. I am looking at a non-Galaxy solution at the moment.

ADD REPLY
2
Entering edit mode
13.6 years ago
brentp 24k

I haven't used this particular tool, but here is a tool built with Jim Kent's libraries to do the conversion 64to33 (or 33to64).

It comes with a makefile and all the includes necessary so it should be quite fast.

EDIT: there's also a very nice C-API in the Kent-tools: https://github.com/jstjohn/KentLib/blob/master/lib/fastq.c The function signature looks like:

inline void phred64ToPhred33( char * p64, int l)

So it should be easy to use.

ADD COMMENT
0
Entering edit mode

Thanks Brent, I noticed that you mentioned about a potential issue with paired end read that not taken up by fastx_toolkit ( Filtering Paired End Reads ) Does this tool take care of that?

ADD REPLY
0
Entering edit mode

If you're just converting, not filtering that won't be an issue. If you filter after the conversion (for whatever reason), then yes, you'll probably have to figure out how to make sure you get neither or both reads.

ADD REPLY
0
Entering edit mode

Thanks. I am posted another question on filtering I am not sure if it is good to do the filtering of the reads before / after the QC makes much difference.

ADD REPLY
0
Entering edit mode

I'm not so familiar with C. If I want to use this script to convert fastq illumina quality score to fastq sanger quality score, what command should I run?

ADD REPLY
2
Entering edit mode
13.5 years ago
Weronika ▴ 300

You an use the HTSeq package in python: http://www-huber.embl.de/users/anders/HTSeq/doc/sequences.html#sequences. It will read fastq files with any of the common quality encodings, but always write using the Sanger (Phred) encoding.

ADD COMMENT
1
Entering edit mode
13.5 years ago
Bioquant ▴ 160

A dirty and quick solution would be to make a FASTA file with Ns or any other rarely occurring homopolymer sequence of length equal to your read length. Align you FATSQ file against this reference with any quality aware aligner like BWA or Bowtie to get the BAM file (you can parallelize it for speed). Now by definition in BAM file quality scores are recored as Sanger scores. All the reads will be reported only once as unaligned reads. Now you can use Picard to get the Fastq back from the BAM file with Sanger scores!

Note: there is a difference in the way quality scores are recored in Ilummina Fastq files pre and post 1.3 version of Casava.

http://en.wikipedia.org/wiki/FASTQ_format

ADD COMMENT
1
Entering edit mode
12.4 years ago

For anyone dealing with the problem of various fastq encoding schemes and looking to do some sanity checks on their method of conversion. Or, if you just want to look up the phred score for specific ascii code under one of the different schemes. I have found this blog entry extremely useful. It provides Sanger (And Illumina 1.3+ (And Solexa)) Phred Score (Q) ASCII Glyph Base Error Conversion Tables.

ADD COMMENT
0
Entering edit mode

damn, the link you posted has been vandalized!

ADD REPLY
0
Entering edit mode

Is nothing sacred? I emailed the author to let him know.

ADD REPLY
1
Entering edit mode

Fixed. Site is back up.

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6