Question

What to do with read sequence quality

0

Entering edit mode

8.4 years ago

AHW ▴ 90

Really confused (new to the sequence alignment field and from non biological background). I have some understanding of read qualities, however not sure of handling them effectively. For example, the reads with read qualities in compact form are shown below.

@r0
GAACGATACCCACCCAACTATCGCCATTCCAGCAT
+
EDCCCBAAAA@@@@?>===<;;9:99987776554
@r1
CCGAACTGGATGTCTCATGGGATAAAAATCATCCG
+
EDCCCBAAAA@@@@?>===<;;9:99987776554
@r2
TCAAAATTGTTATAGTATAACACTGTTGCTTTATG
+
EDCCCBAAAA@@@@?>===<;;9:99987776554

I also know that @ has the lowest value and ~ has the highest value. If the error probability of a base is e, the Phred quality Q is:

Q = -10 * log(e) / log(10)

and the Solexa quality sQ is:

sQ = -10 * log(e / (1 - e)) / log(10)

What I would like to know is, when we try to do the alignment of the read with the reference genome, is the total quality of all the bases considered or a single base is considered. For example the read

GAACGATACCCACCCAACTATCGCCATTCCAGCAT

has a quality

EDCCCBAAAA@@@@?>===<;;9:99987776554

Should I conside base by base quality (if suppose the quality of any of the base is less than 40, don't try to align the sequence with the reference genome) or the cumulative quality score of all the bases will taken and if the total score is less than some threshold, the read will not be considered for alignment.

Is it also sensible to report the read quality with valid alignments.?

sequencing alignment • 1.4k views

ADD COMMENT • link updated 8.4 years ago by Devon Ryan 105k • written 8.4 years ago by AHW ▴ 90

score 1 · Answer 1 · 2017-03-18

1

Entering edit mode

8.4 years ago

Devon Ryan 105k

The quality of each base is considered individually. The strategy for dealing with mismatches is often to make the penalty incurred inversely dependent on the base call quality (i.e., if the quality is low, then minimize the penalty).

ADD COMMENT • link 8.4 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks, do you mean to say that the quality does not play any role if the read is matching exactly.

ADD REPLY • link 8.4 years ago by AHW ▴ 90

2

Entering edit mode

The answer to this is based entirely on what you're doing. In general, if you don't know the answer to this you probably shouldn't be writing a tool that might need to know it.

ADD REPLY • link 8.4 years ago by Devon Ryan 105k