Rsamtools: get nucleotide of read ID X at position P
0
1
Entering edit mode
3.5 years ago
francois ▴ 80

Easy on paper, but I cannot crack it!

Say I have

  • a BAM file
  • a genomic position I am interested in (eg. chr20:4690579)
  • a specific read ID (QNAME) I am interested in (eg. c29bd60b-6f46-4c65-873c-4b74afc5ad87)

Using Rsamtools, how do I get the nucleotide from that read at that position?


I am fairly close using pileup(). Here is where I am so far;

bampoint <- BamFile(here('.../46345_hp.bam'))
pos <- ScanBamParam(which=GRanges('chr20', IRanges(4690579, 4690579)))
pilpa <- PileupParam(distinguish_strands=FALSE,
                     distinguish_nucleotides=TRUE)

pileup(bampoint, scanBamParam=pos, pileupParam=pilpa)

returns

  seqnames     pos nucleotide count           which_label
1    chr20 4690579          -     4 chr20:4690579-4690579
2    chr20 4690579          A     8 chr20:4690579-4690579
3    chr20 4690579          C     4 chr20:4690579-4690579
4    chr20 4690579          G   219 chr20:4690579-4690579
5    chr20 4690579          T   254 chr20:4690579-4690579

Not too bad – now how do I do this for one specific read?


Thoughts so far

Solution1: I could delete all other reads except my read ID from the bam file before I give it to pileup(), but I do not seem to be able to give it anything else than the BamFile() pointer. For example,

bam <- scanBam(bampoint)

then same as above, but pileup like:

pileup(bam, scanBamParam=pos, pileupParam=pilpa)

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘pileup’ for signature ‘"missing"’

If I find how to feed a bam to pileup, then I can probably filter the bam beforehand to keep only my read ID...

Solution2: if there was something like a distinguish_read parameter in pileup(), I could simply look for my read in the output, but I understand that would not be really called 'piling up'...

Any idea?

rsamtools • 1.3k views
ADD COMMENT
1
Entering edit mode

Do you have to use samtools? Grepping the sam file seems the natural way to filter by read name, then a little awk to find the right letter in the read based on the map position.

ADD REPLY
0
Entering edit mode

*Rsamtools. I think yes. Or at least would simplify many things if I could probe the alignment/read from within my R script.

ADD REPLY
0
Entering edit mode

Read ID is called QNAME in SAM format. ScanBam seems to give access to the QNAME - maybe you can find out how to get that from ScanBamParam?

ADD REPLY
0
Entering edit mode

Hmm... Getting closer.

scanBam(bampoint, param=ScanBamParam(what="qname"))[[1]][[1]]

Does get me all the QNAMEs in my BAM file. But I cannot find how to import specific reads by QNAME. I could also get the index of the read I want, but I am not sure where to feed that either.

The which parameter sounded promising but looks like it's only for genomic positions.

It sounds like filterBam() might allow me to write a whole bunch of single-read BAM files and import them back in, but evidently not ideal.

ADD REPLY

Login before adding your answer.

Traffic: 2739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6