Question

Guessing The Quality Scale In Fastq Files

9

Entering edit mode

13.8 years ago

Manuel ▴ 410

Is there an easy way to guess the scale, given a sufficiently large FASTQ file?

The best would be some working code that I could learn from. However, both BioPerl and BioPython appear not to contain guessing code.

fastq quality • 13k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 13.8 years ago by Manuel ▴ 410

score 6 · Answer 1 · 2011-03-22

6

Entering edit mode

13.8 years ago

brentp 24k

You read the biopython code here? That's the best explanation of the quality scores I've seen.

There's also a nice text-graphic about 2/3rd's of the way down the wikipedia page

Finally, FastQC guesses the encoding of your quality scores, so you could look at the java code.

ADD COMMENT • link 13.8 years ago by brentp 24k

0

Entering edit mode

Thanks, BioPython does not hav guessing code, though, right? FastQC just looks at the lowest seen quality. I guess that's most promising, then, maybe augmented by checking an upper limit, too.

ADD REPLY • link 13.8 years ago by Manuel ▴ 410

Ram · Answer 2 · 2011-03-23

3

Entering edit mode

13.8 years ago

Mikael Huss 4.8k

Here is a Perl script for guessing the quality scale

https://www.uppnex.uu.se/content/check-fastq-quality-score-format

ADD COMMENT • link 13.8 years ago by Mikael Huss 4.8k

1

Entering edit mode

Here is the new link for this Perl tool: http://www.uppmax.uu.se/userscript/check-fastq-quality-score-format

It has been improved recently.

-- update --

You can find it in this repository, under this name fastq_guessMyFormat.pl: https://github.com/NBISweden/GAAS/tree/master/annotation/Tools/Util

Here is a link to download it directly.

ADD REPLY • link updated 2.3 years ago by Ram 44k • written 9.3 years ago by Juke34 9.0k

0

Entering edit mode

link is meanwhile broken also.

ADD REPLY • link 7.1 years ago by Yahan ▴ 400

0

Entering edit mode

Thanks,

Updated now

ADD REPLY • link 7.1 years ago by Juke34 9.0k

Istvan Albert · Answer 3 · 2011-03-22

2

Entering edit mode

13.8 years ago

Pierre Lindenbaum 164k

Does the FAST-X toolkit answer your needs ? http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_boxplot_usage

ADD COMMENT • link updated 11.9 years ago by Istvan Albert 102k • written 13.8 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

Hm, I would like to do this programatically. I think something like the FastQC guesser looks more promising. Thanks, though.

ADD REPLY • link 13.8 years ago by Manuel ▴ 410

Ram · Answer 4 · 2011-04-19

2

Entering edit mode

13.7 years ago

Ryan Thompson ★ 3.6k

I wrote a Python-based FASTQ quality guesser: https://github.com/DarwinAwardWinner/fastqident It uses BioPython's FASTQ parser, so it will work on anything that is parsable by BioPython.

ADD COMMENT • link 13.7 years ago by Ryan Thompson ★ 3.6k

0

Entering edit mode

i am getting 404'd

ADD REPLY • link 13.4 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Looks good, but it doesn't install correctly. The module "placsupport" cannot be found in PyPI.

ADD REPLY • link 11.7 years ago by xapple ▴ 230

0

Entering edit mode

The placsupport module can be found at https://github.com/DarwinAwardWinner/placsupport

ADD REPLY • link updated 24 months ago by Ram 44k • written 9.6 years ago by Keith Callenberg ▴ 960

score 2 · Answer 5 · 2011-08-18

Isn't that solving the wrong problem? The guessing code in FastQC looks fragile, it simply looks at the smallest code used for qualities, so it depends on actually seeing low quality bases.

I believe you should get the correct encoding from extra knowledge (i.e. knowing which version of which program generated the file, say from some log file), and then convert to a well specified format (e.g. BAM) once. Please don't perpetuate the practive of guessing at the details underspecified formats.

score 0 · Answer 6 · 2011-08-18

0

Entering edit mode

13.4 years ago

Sequencegeek ▴ 740

In addition to Ryan, I have a python based fastq quality guesser as well if you would like to use it. It is just standard python (no biopython). PM if interested.

ADD COMMENT • link 13.4 years ago by Sequencegeek ▴ 740