I'm working with coding in c/c++ using the samtools suite.
Starting with the example 'calDepth.c' included in the samtools package, I've tried to make some modifications. Inside the pileup function I can insert these three lines of code:
int dist = (int)pos - (int)pl->b->core.pos;
// need to find reference DNA at position
// 1 for A, 2 for C, 4 for G, 8 for T and 15 for N
uint8_t* seq_ptr = bam1_seq(pl->b);
int base = bam1_seqi(seq_ptr,dist);
noting that as defined in the example code, pl is a bam_pileup1_t struct thus pl->b is a bam1_t struct and pl->b->core is a bam1_core_t struct all of which are defined in bam.h.
As expected, base stores the nucleotide. But this is a pileup at a given position with many reads which could have different nucleotides if we were looking at a mutation. So firstly, which read does the 'base' value represent? And more importantly, how can I get the nucleotides for each read?
Yes that's what I want to get. Using the poster's example I could extract the information sans pileup function but it would be messy and it seems in the response that I should be able to do this still using pileup. The example at the bottom of the link seems to be intended to produce an array of depths for when there's a match or mismatch but doesn't produce accurate results, for instance placing a print statement in after the for loop printf("depth at j %d is %d\n",j,depth[j]); I see depths for the last few or last 1 position of the read only. And even with eliminating the 'if (op == BAM_CMATCH) condition, the numbers do not match those from the original cal_depth.c or from bam viewer programs.
I suppose the first question on the algorithm is the 'loop over each read'. If pileup is running iteratively for each position, how do I reference 'read index' in each pileup call? I don't see any indication of this existing in the header file. And then followed by, given the 'read index' how do I get base at pos of read index? Right now what I see is one read per position can be accessed, which unfortunately wouldn't be one-to-one.