Hello,
I am working on parsing some BAM files from pysam and I am was looking for some information on what the output from pysam fetch produdes.
This is what I am working with code wise:
import pysam
import sys
from ast import literal_eval as make_tuple
samfile = pysam.AlignmentFile('sample1.bam', "rb")
for c in contigs:
for read in samfile.fetch(c):
r=str(read).split("\t")
cig=read.cigarstring
MD_TAG=make_tuple(r[-1])[-2][1]
print(read)
This pulls some information such as the CIGAR string and MD Tag. I want to know what the other features that are being output are though from fetch (I looked around but I haven't found a satisfactory explanation). This is the output example from one of my BAM files:
SRR5720260.4404562.1 83 0 1135779 40 144M 0 1135607 144 AATTATTGTCACTTTGCCAATATTTTCATTTAATTCCTCAAATCCAATTATGATTGAAGGGTCAATTGGTGTGGTAAGGCTAGAAGGTTTGTAAATAGTAGATTCAGTTTGAGTCGGTTGATTTTCCCCAGTCCAGTTTGATAG array('B', [36, 36, 36, 36, 36, 36, 36, 36, 27, 36, 36, 36, 36, 36, 36, 36, 32, 36, 32, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 36, 36, 36, 36, 36, 36, 36, 32, 36, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 36, 36, 36, 36, 36, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 32, 32, 32, 32]) [('AS', -30), ('XN', 0), ('XM', 6), ('XO', 0), ('XG', 0), ('NM', 6), ('MD', '9A58T0C8T31C32A0'), ('YS', -10), ('YT', 'CP')]
These are the columns I have been able to extrapolate:
- Read -->
SRR5720260.4404562.1
- unknowcn -->
83
- unknown -->
0
- unknown -->
1135779 40
- CIGAR -->
144M
- unknown -->
0
- unknown -->
1135607 144
- query sequence:
AATTATTGTCACTTTGCCAATATTTTCATTTAATTCCTCAAATCCAATTATGATTGAAGGGTCAATTGGTGTGGTAAGGCTAGAAGGTTTGTAAATAGTAGATTCAGTTTGAGTCGGTTGATTTTCCCCAGTCCAGTTTGATAG
- unknown (I think this is quality scores) -->
array('B', [36, 36, 36, 36, 36, 36, 36, 36, 27, 36, 36, 36, 36, 36, 36, 36, 32, 36, 32, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 36, 36, 36, 36, 36, 36, 36, 32, 36, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 36, 36, 36, 36, 36, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 32, 32, 32, 32, 32])
- Read info -->
[('AS', -30), ('XN', 0), ('XM', 6), ('XO', 0), ('XG', 0), ('NM', 6), ('MD', '9A58T0C8T31C32A0'), ('YS', -10), ('YT', 'CP')]
Known values: - Paired-end alignment quality (AS) - Amplicon name tag (XN) - Edit distance tag (NM) - the number of mismatches in the alignment(XM) - the number of gap opens, for both read and reference gaps, in the alignment (XO) - the number of gap extensions, for both read and reference gaps, in the alignment.(XG) - edit distance to the reference, including ambiguous bases but excluding clipping(NM) - MD tag (MD) Unknown values: - YS - YT
Any help would be appreciated and please correct me if I have misunderstood any of these values!
https://samtools.github.io/hts-specs/SAMv1.pdf
Eric - Do not delete posts that have received feedback.