what should a SAM/BAM record contain when there are no quality scores
2
1
Entering edit mode
7.3 years ago
Ann ★ 2.4k

Hi!

As everyone who works with this format already knows, the SAM/BAM sequence alignment format contains a field reserved for a per-base quality score string.

But if per-base quality scores are not available, what value should this field contain?

The specification (p. 16) says this field should contain

"Phred base quality (a sequence of 0xFF if absent)"

But I'm not sure how to interpret this.

If you (or your students :-) were writing a file in SAM format for alignments and you don't have access to quality scores, what would you put in this field?

All the best,

Ann

samtools SAM BAM • 1.9k views
ADD COMMENT
2
Entering edit mode

There's also text about this in §1.4 talking specifically about SAM's QUAL: “This field can be a ‘*’ when quality is not stored”.

Incidentally, if the text about “a sequence of 0xFF if absent” is on p16 of your copy of the SAM specification, then you have a somewhat out of date specification from before the defined aux tags were split off into a separate SAMtags.pdf document. There's not much substantive that's changed since then but there have been a few clarifications to the text, so it's worth working from the current specifications at http://samtools.github.io/hts-specs/.

ADD REPLY
3
Entering edit mode
7.3 years ago
jkbonfield ★ 1.3k

In SAM format the field is '*'. In BAM format this is replaced by a run of 0xff values to match the same length as the sequence. (I've no idea why.)

Note this raises a curious ambiguity. How does a single base read with confidence 9 ('*') get interpreted?

ADD COMMENT
1
Entering edit mode

The BAM “run of 0xFF values” is thus so that the offset within the record to the auxiliary data is the same regardless of QUAL data being present or absent.

ADD REPLY
0
Entering edit mode
7.3 years ago
h.mon 35k

I "asked" your question to NCBI online Blast and got this answer:

@HD     VN:1.2  SO:coordinate   GO:reference
@SQ     SN:HWI-1KL182:84:D2EECACXX:6:1109:2522:49059 1:N:0:CGATGT       LN:101
@PG     ID:0    VN:2.6.1+       PN:blastn
XM_019987163.1  0       HWI-1KL182:84:D2EECACXX:6:1109:2522:49059 1:N:0:CGATGT  1       255     1666H101M796H   *       0       0       GGGCATTTGAGCACCGAGGCTCGCGAGAAAGACAAGTGCAAGGACAGGGAGCGGGAGCACTCGGAATCGCGCAAGGATCTGGGCACGGATGAGCACAAGGC   *       AS:i:101        EV:f:3.49128e-44        NM:i:0  PI:f:100.00     BS:f:187.632

I "asked" the same question to reformat.sh, converting a fasta file to sam:

@HD VN:1.4  SO:unsorted
@PG ID:BBMap    PN:BBMap    VN:37.17
HWI-1KL182:84:D2EECACXX:6:1109:2522:49059 1:N:0:CGATGT  4   *   0   0   *   *   00  GGGCATTTGAGCACCGAGGCTCGCGAGAAAGACAAGTGCAAGGACAGGGAGCGGGAGCACTCGGAATCGCGCAAGGATCTGGGCACGGATGAGCACAAGGC   *

So it seems it is an asterisk if there are no qualities. However, I do not know if Brian Bushnell and NCBI devs followed the SAM specs, or decided to create their own standard.

ADD COMMENT

Login before adding your answer.

Traffic: 1400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6