Guessing The Quality Scale In Fastq Files
6
9
Entering edit mode
13.7 years ago
Manuel ▴ 410

Is there an easy way to guess the scale, given a sufficiently large FASTQ file?

The best would be some working code that I could learn from. However, both BioPerl and BioPython appear not to contain guessing code.

fastq quality • 13k views
ADD COMMENT
6
Entering edit mode
13.7 years ago
brentp 24k

You read the biopython code here? That's the best explanation of the quality scores I've seen.

There's also a nice text-graphic about 2/3rd's of the way down the wikipedia page

Finally, FastQC guesses the encoding of your quality scores, so you could look at the java code.

ADD COMMENT
0
Entering edit mode

Thanks, BioPython does not hav guessing code, though, right? FastQC just looks at the lowest seen quality. I guess that's most promising, then, maybe augmented by checking an upper limit, too.

ADD REPLY
3
Entering edit mode
13.7 years ago

Here is a Perl script for guessing the quality scale

https://www.uppnex.uu.se/content/check-fastq-quality-score-format

ADD COMMENT
1
Entering edit mode

Here is the new link for this Perl tool: http://www.uppmax.uu.se/userscript/check-fastq-quality-score-format

It has been improved recently.

-- update --

You can find it in this repository, under this name fastq_guessMyFormat.pl: https://github.com/NBISweden/GAAS/tree/master/annotation/Tools/Util

Here is a link to download it directly.

ADD REPLY
0
Entering edit mode

link is meanwhile broken also.

ADD REPLY
0
Entering edit mode

Thanks,

Updated now

ADD REPLY
2
Entering edit mode
ADD COMMENT
1
Entering edit mode

Hm, I would like to do this programatically. I think something like the FastQC guesser looks more promising. Thanks, though.

ADD REPLY
2
Entering edit mode
13.6 years ago
Ryan Thompson ★ 3.6k

I wrote a Python-based FASTQ quality guesser: https://github.com/DarwinAwardWinner/fastqident It uses BioPython's FASTQ parser, so it will work on anything that is parsable by BioPython.

ADD COMMENT
0
Entering edit mode

i am getting 404'd

ADD REPLY
0
Entering edit mode

Looks good, but it doesn't install correctly. The module "placsupport" cannot be found in PyPI.

ADD REPLY
0
Entering edit mode

The placsupport module can be found at https://github.com/DarwinAwardWinner/placsupport

ADD REPLY
2
Entering edit mode
13.3 years ago
Marvin ▴ 900

Isn't that solving the wrong problem? The guessing code in FastQC looks fragile, it simply looks at the smallest code used for qualities, so it depends on actually seeing low quality bases.

I believe you should get the correct encoding from extra knowledge (i.e. knowing which version of which program generated the file, say from some log file), and then convert to a well specified format (e.g. BAM) once. Please don't perpetuate the practive of guessing at the details underspecified formats.

ADD COMMENT
0
Entering edit mode
13.3 years ago
Sequencegeek ▴ 740

In addition to Ryan, I have a python based fastq quality guesser as well if you would like to use it. It is just standard python (no biopython). PM if interested.

ADD COMMENT

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6