I created a GAlignmentPairs object from a bamfile that contains only properly mated paired-end reads aligned to a short reference sequence. As expected, I have two IRanges corresponding to read1 and read2 of the mated pair and the reference sequence in the object.
Checking it gives me lots of confidence:
table(isProperPair(h))
TRUE 37
seqinfo(h)
Seqinfo object with 1 sequence from an unspecified genome:
seqnames seqlengths isCircular genome
Test:geneR:2831:3169:1 339 NA <na>
My goal is to define one specific position in each IRange (i.e. a position in read1 and read2) and extract the corresponding nucleotides from each of the 37 corresponding read pairs (separately). It seems trivial like using getSeq with a BSGenome object, but I cannot achieve my goal in this context, which may be totally awry.
Note that this is NOT a pileup exercise but rather a means to select and extract nucleotide sites from each of read1 and read2 as a mated pair separately. Think of it as a mated-pair haplotyping exercise.
Any ideas, hints, alternatives are welcome.
Could you specify your problem, I do not really get the point? Is it that you have trouble getting the DNA sequence itself or is it rather a problem with handling or accessing the GAlignmentPairs output object (accessing the mate coordinates etc.)? Please provide a short example of what is exactly required and how the output should look like.