I am trying to write reads into a SAM file format. First I'll extract the read from BAM file, update the query name and then write it into a SAM file. I need to perform this on each read of the BAM file. How can I write it back into the SAM file? My assumption is the SAM file is just a tab-delimited file, so first I'll write a tab-delimited file and then append the header and finally convert the sam file into the BAM file. I'm using the pysam and doing the following
for read in READS:
query_name = read.query_name + 'added_info'
contig = read.reference_name
query_sequence = read.query_sequence
flag = read.flag
reference_id = read.reference_id
reference_start = read.reference_start
mapping_quality = read.mapping_quality
cigar = read.cigarstring
next_reference_id = read.next_reference_id
next_reference_start = read.next_reference_start
template_length = read.template_length
query_qualities = read.query_qualities
query_quality_string = ''
query_quality_string = query_quality_string.join([chr(ch + 33) for ch in query_qualities])
tags = read.tags
outf.write(query_name + '\t' + flag + '\t' + contig + '\t' + reference_start + '\t' +
mapping_quality + '\t' + cigar + '\t' + next_reference_id + '\t' + next_reference_start + '\t' +
str(template_length) + '\t' + query_sequence + '\t' + query_quality_string + '\t' + tags)
Issues:
- Is the order in the SAM file correct?
- I have converted the query_qualities back into quality string by converting it into ASCII. Is that correct?
- How can I convert the
tags
into proper tags in the SAM file format. Currently, it is in this format[('NM', 12), ('MD', '81T74'), ('AS', 134), ('XS', 20)]
is it just some regular string processing?