Hi,
Can someone explain what the following abbreviation mean flag 99, mq 60, cigar 88M23D62M, c1 0,5, MD 88 CACC'XI 1, NM 23, AS 121, PS 11297, rd esp the following 1:2114:12111:13792]
Thank you
Hi,
Can someone explain what the following abbreviation mean flag 99, mq 60, cigar 88M23D62M, c1 0,5, MD 88 CACC'XI 1, NM 23, AS 121, PS 11297, rd esp the following 1:2114:12111:13792]
Thank you
Hi, here is a breakdown of each part that you asked about:
flag 99: This indicates various properties of the read alignment. In this case, a flag value of 99 typically means that the read is mapped to the reference, is part of a pair, and both reads in the pair are mapped in the same orientation.
mq 60: This is the mapping quality score, which represents the likelihood that the alignment is incorrect.
cigar 88M23D62M: The CIGAR string describes how the read aligns to the reference genome. In this case, the read has 88 matches (M), followed by a deletion of 23 bases (D), and then 62 more matches.
c1 0,5: This indicates the coordinates of the alignment on the reference genome. In this case, it starts at position 0 and extends for 5 bases.
MD 88 CACC'XI 1: The MD tag provides information about mismatches and deletions in the alignment. It indicates that there are mismatches at specific positions, with 'CACC' mismatches at position 88.
NM 23: NM stands for "edit distance," which is the minimum number of changes (substitutions and indels) required to change the read sequence into the reference sequence. In this case, there are 23 differences between the read and the reference.
AS 121: AS is the alignment score, representing the sum of the alignment scores for the alignment.
PS 11297: PS stands for "position score," which is the score of the mate.
rd esp the following 1:2114:12111:13792]: This might be additional information about the read, like a header.
Please see GenoMax's link that he provided in the samtools GitHub, and usually refer to these documentations first when you have questions about what your output means
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Those are SAM format fields. Check section 1.4 here: https://samtools.github.io/hts-specs/SAMv1.pdf
That is part of the fastq read header. Illumina read headers are described here: https://en.wikipedia.org/wiki/FASTQ_format#Illumina_sequence_identifiers
Thank you for the link and details