I'm trying to modify the header and chromosome to include the file prefix and output this to a new bam file. I was going to do this with sed but would rather do it with pysam if it is possible.
A line of my bam file is as follows:
GWNJ-0901:658:GW2006263225th:6:1101:12824:2610 99 NC_011993.1_Escherichia_coli_LF82_complete_genome_length_4773108 2056740 42 27M = 2056869 150 CGGCTGCACGGGCGAAGTTTCCGCCGC FJ-AJAJJJ<AAFJJJ<J<JFJ<-A7A AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:27 YS:i:-3 YT:Z:CP
Where I want to access the 3rd column, chromosome colum I think it is called, and concat a prefix to it:
prefix_NC_011993.1_Escherichia_coli_LF82_complete_genome_length_4773108
I'm having trouble writing out this amended column to an output file using fetch:
for read in input_bam.fetch(reference=species):
print(input_bam.get_reference_name(read.reference_id)) # Returns chromosome column
prefixed_chrom=prefix + '_' +input_bam.get_reference_name(read.reference_id)
with pysam.AlignmentFile(full_output_path, "w",template=input_bam) as outf:
a = pysam.AlignedSegment()
a.query_name = read.query_name
a.query_sequence = read.query_sequence
a.get_reference_name(read.reference_id) = prefixed_chrom
a.flag = read.flag
a.reference_id = read.reference_id
a.reference_start = read.reference_start
a.mapping_quality = read.mapping_quality
a.cigar = read.cigar
a.next_reference_id = read.next_reference_id
a.next_reference_start= read.next_reference_start
a.template_length=read.template_length
a.query_qualities = read.query_qualities
a.tags = read.tags
outf.write(a)
SyntaxError: cannot assign to function call
How can I write this amended chromosome column to an output file using pysam? Many thanks!
Linda : This is a specific question so don't use
Forum
tag.Forum
posts are generally open-ended discussions that may have more than one point of view.My bad. This is my first question. Hopefully it's updated now.
see also Bam File: Change Chromosome Notation