Hello,
for further analysis (methylation calling and Mbias calculation etc.) I need to determine the intra-read-position of a read base at a given Pileupsite. I'm iterating over a SAM-file via pysam, resulting in a pysam.pileupcolumn Object for every given position. This PileupColumn Object contains as attribute a list of pysam.pileupread objects. For each of those I want to determine the exact position in the read. A pileupread object contains the alignment as attribute in form of a pysam.AlignedSegment object. This AlignedSegment object now has may of the classic SAM-properties such as the CIGAR-String and the query_sequence (WITH softclipped bases). Another attribute of the pileupread class is the query_position which seems pretty nice, however it remains unclear whether this query_position takes softclipped bases into account.
Does anyone know more about this attribute query_position? Has anyone had the same Problem so far?
Why reinvent the wheel, PileOMeth already does this.
Thanks, I'm going to check this out!
+1 for PileOMeth, I used it today and it worked great :)
Doing it the manual way (parsing CIGAR strings and SEQ with soft-clipped bases) is a lot harder than it sounds, actually. Normally I'm one who likes to reinvent wheels so I can learn how they work, but in this instance I remember it being a real PITA because there are so many conditions which are non-obvious at first. Honestly, best to use a tool that someone else has dumped a few months into thinking about it, in this circumstance.
instead of query_position you can try get_reference_positions, you can specify if soft clipped positions should be included.
Oh, awesome - pysam gets better and better by the day :D