Question

Rsamtools: get nucleotide of read ID X at position P

1

Entering edit mode

3.5 years ago

francois ▴ 80

Easy on paper, but I cannot crack it!

Say I have

a BAM file
a genomic position I am interested in (eg. chr20:4690579)
a specific read ID (QNAME) I am interested in (eg. c29bd60b-6f46-4c65-873c-4b74afc5ad87)

Using Rsamtools, how do I get the nucleotide from that read at that position?

I am fairly close using pileup(). Here is where I am so far;

bampoint <- BamFile(here('.../46345_hp.bam'))
pos <- ScanBamParam(which=GRanges('chr20', IRanges(4690579, 4690579)))
pilpa <- PileupParam(distinguish_strands=FALSE,
                     distinguish_nucleotides=TRUE)

pileup(bampoint, scanBamParam=pos, pileupParam=pilpa)

returns

  seqnames     pos nucleotide count           which_label
1    chr20 4690579          -     4 chr20:4690579-4690579
2    chr20 4690579          A     8 chr20:4690579-4690579
3    chr20 4690579          C     4 chr20:4690579-4690579
4    chr20 4690579          G   219 chr20:4690579-4690579
5    chr20 4690579          T   254 chr20:4690579-4690579

Not too bad – now how do I do this for one specific read?

Thoughts so far

Solution1: I could delete all other reads except my read ID from the bam file before I give it to pileup(), but I do not seem to be able to give it anything else than the BamFile() pointer. For example,

bam <- scanBam(bampoint)

then same as above, but pileup like:

pileup(bam, scanBamParam=pos, pileupParam=pilpa)

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘pileup’ for signature ‘"missing"’

If I find how to feed a bam to pileup, then I can probably filter the bam beforehand to keep only my read ID...

Solution2: if there was something like a distinguish_read parameter in pileup(), I could simply look for my read in the output, but I understand that would not be really called 'piling up'...

Any idea?

rsamtools • 1.3k views

ADD COMMENT • link 3.5 years ago by francois ▴ 80

1

Entering edit mode

Do you have to use samtools? Grepping the sam file seems the natural way to filter by read name, then a little awk to find the right letter in the read based on the map position.

ADD REPLY • link 3.5 years ago by swbarnes2 14k

0

Entering edit mode

*Rsamtools. I think yes. Or at least would simplify many things if I could probe the alignment/read from within my R script.

ADD REPLY • link 3.5 years ago by francois ▴ 80

0

Entering edit mode

Read ID is called QNAME in SAM format. ScanBam seems to give access to the QNAME - maybe you can find out how to get that from ScanBamParam?

ADD REPLY • link 3.5 years ago by Ram 44k

0

Entering edit mode

Hmm... Getting closer.

scanBam(bampoint, param=ScanBamParam(what="qname"))[[1]][[1]]

Does get me all the QNAMEs in my BAM file. But I cannot find how to import specific reads by QNAME. I could also get the index of the read I want, but I am not sure where to feed that either.

The which parameter sounded promising but looks like it's only for genomic positions.

It sounds like filterBam() might allow me to write a whole bunch of single-read BAM files and import them back in, but evidently not ideal.

ADD REPLY • link 3.5 years ago by francois ▴ 80