Hello everyone!
I am trying to rewrite an existing bam file via pysam. I have reference.fasta file and file_to_be_changed.bam. I would like to change the nucleotide on a read given the position of the nucleotide and the reference position.
For example:
pos 0123456
ref ATCGGTC
bam ATCG
CGT
GGTC
I think that the problem could be solved with:
if pos_on_read == 0:
if nucl_on_reference == 'A':
nucl_on_read = 'C'
However, I have some problems with realising the algorithm via pysam.. Give me a hand please!
I have an ancient genome matrix which consists of the frequences of a nucleuote occurring given the reference base and the position on the read. I would like to apply this operator to a bam file to make it pseudo-ancient.
But then some of the alignments would not be correct. How come that is not a concern? To have incorrect alignments?
An alignment operates in the context of the entire information in the genome, you can't expect that locally changing both the reference and read would still produce valid alignments across the genome. Sure, some, or even the majority might stay correct, but that ratio depends on many factors and the error rate would be highly uneven, might be the most error-prone in the locations where the information between ancient and non-ancient genomes is more meaningful.
Won't realigment be helpful?
that would work
you could extract your reads from the BAM file then realign them to the new genome