Question

Converting Quality Scores To Sanger

0

Entering edit mode

13.9 years ago

Haiping ▴ 110

My data were generated by Hiseq2000. So I used -F ILMFQ during run novoalign. Should I still need to convert the quality scores to sanger before I used samtool pileup for SNP calling? thanks for all the comments

quality scoring • 9.8k views

ADD COMMENT • link updated 13.9 years ago by Docroberson ▴ 30 • written 13.9 years ago by Haiping ▴ 110

0

Entering edit mode

I just found this from novoalign websit:

Question: Does Novoalign support Sanger and Illumina FASTQ. Answer Yes. Sanger and Illumina FASTQ formats are both supported. The quality values are converted to phred values using the Sanger method and used in subsequent alignment routines.

Does it means that we don't need to worry about it?

ADD REPLY • link 13.9 years ago by Haiping ▴ 110

Ram · Answer 1 · 2011-07-01

3

Entering edit mode

13.9 years ago

Istvan Albert 102k

I believe that the Hiseq2000 uses Sanger encoding already. What is called Illumina-mode is now obsolete. This of course means that you would need to rerun your mapping.

Check the post below on how to detect the encoding from your data:

A: Write Script For Selection Of Fastq File With Sanger Format

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 13.9 years ago by Istvan Albert 102k

1

Entering edit mode

To comment on Istvan's answer, you can still found Illumina 1.3 (phred+64) based quality scores. In fact, it depends on the version of the Illumina software which is installed on the machine. So, even if it's from HiSeq 2000, you have to be careful and you have to check. Nevertheless, it's true that latest version generates Illumina 1.9 quality scores which are phred+33 based (like Sanger).

ADD REPLY • link 13.8 years ago by toni ★ 2.2k

0

Entering edit mode

I got my data nearly 1 years ago. And I am sure that it is phred+64. I tried to use the command in links but failed cause of we do not have guess-encoding.py. Anyway, it seems no problem for SNP calling.thanks for the comments.

ADD REPLY • link 13.9 years ago by Haiping ▴ 110

score 2 · Answer 2 · 2011-07-01

2

Entering edit mode

13.9 years ago

Docroberson ▴ 30

It doesn't depend so much on HiSeq versus GAIIx as which version of the pipeline you're using. HiSeq SHOULD be 1.3+, which does encode phred quality scores with an offset of 64. The best thing you can do is to confirm what version of the pipeline the core that generated your sequence is using.

If it is 1.3+, the option you used in NOVOALIGN is correct as it interally converts to phred, since samtools requires phred scaling. If it was the old format that used the log of the probability ratios you would need to use SLXFQ instead.

ADD COMMENT • link 13.9 years ago by Docroberson ▴ 30

0

Entering edit mode

I am sure my data are 1.3+ since the worst quality is B and quality score is 2. So I don.t need to worry about the reliablity of the SNP calling. Thanks for you comments.

ADD REPLY • link 13.9 years ago by Haiping ▴ 110