Nucleotide Determination In Samtools Pileup Function
1
0
Entering edit mode
11.6 years ago
coryprzy • 0

I'm working with coding in c/c++ using the samtools suite.

Starting with the example 'calDepth.c' included in the samtools package, I've tried to make some modifications. Inside the pileup function I can insert these three lines of code:

        int dist = (int)pos - (int)pl->b->core.pos;
        // need to find reference DNA at position
        // 1 for A, 2 for C, 4 for G, 8 for T and 15 for N
        uint8_t* seq_ptr = bam1_seq(pl->b);
        int base = bam1_seqi(seq_ptr,dist);

noting that as defined in the example code, pl is a bam_pileup1_t struct thus pl->b is a bam1_t struct and pl->b->core is a bam1_core_t struct all of which are defined in bam.h.

As expected, base stores the nucleotide. But this is a pileup at a given position with many reads which could have different nucleotides if we were looking at a mutation. So firstly, which read does the 'base' value represent? And more importantly, how can I get the nucleotides for each read?

samtools mpileup c • 2.9k views
ADD COMMENT
0
Entering edit mode
11.6 years ago

as far as I understand you want to get all the bases of each reads for the pileup position.

loop over each read
  dicard if ihe duplicate flag was set, etc... (if needed)
  get the cigar string
  set refpos= pos(read)
  loop over each element of the cigar string until you get refpos=pileipup.

see C API for BAM: iterating over a CIGAR string.

ADD COMMENT
0
Entering edit mode

Yes that's what I want to get. Using the poster's example I could extract the information sans pileup function but it would be messy and it seems in the response that I should be able to do this still using pileup. The example at the bottom of the link seems to be intended to produce an array of depths for when there's a match or mismatch but doesn't produce accurate results, for instance placing a print statement in after the for loop printf("depth at j %d is %d\n",j,depth[j]); I see depths for the last few or last 1 position of the read only. And even with eliminating the 'if (op == BAM_CMATCH) condition, the numbers do not match those from the original cal_depth.c or from bam viewer programs.

ADD REPLY
0
Entering edit mode

I suppose the first question on the algorithm is the 'loop over each read'. If pileup is running iteratively for each position, how do I reference 'read index' in each pileup call? I don't see any indication of this existing in the header file. And then followed by, given the 'read index' how do I get base at pos of read index? Right now what I see is one read per position can be accessed, which unfortunately wouldn't be one-to-one.

ADD REPLY

Login before adding your answer.

Traffic: 1371 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6