Easy on paper, but I cannot crack it!
Say I have
- a BAM file
- a genomic position I am interested in (eg. chr20:4690579)
- a specific read ID (QNAME) I am interested in (eg. c29bd60b-6f46-4c65-873c-4b74afc5ad87)
Using Rsamtools, how do I get the nucleotide from that read at that position?
I am fairly close using pileup(). Here is where I am so far;
bampoint <- BamFile(here('.../46345_hp.bam'))
pos <- ScanBamParam(which=GRanges('chr20', IRanges(4690579, 4690579)))
pilpa <- PileupParam(distinguish_strands=FALSE,
distinguish_nucleotides=TRUE)
pileup(bampoint, scanBamParam=pos, pileupParam=pilpa)
returns
seqnames pos nucleotide count which_label
1 chr20 4690579 - 4 chr20:4690579-4690579
2 chr20 4690579 A 8 chr20:4690579-4690579
3 chr20 4690579 C 4 chr20:4690579-4690579
4 chr20 4690579 G 219 chr20:4690579-4690579
5 chr20 4690579 T 254 chr20:4690579-4690579
Not too bad – now how do I do this for one specific read?
Thoughts so far
Solution1: I could delete all other reads except my read ID from the bam file before I give it to pileup(), but I do not seem to be able to give it anything else than the BamFile() pointer. For example,
bam <- scanBam(bampoint)
then same as above, but pileup like:
pileup(bam, scanBamParam=pos, pileupParam=pilpa)
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘pileup’ for signature ‘"missing"’
If I find how to feed a bam to pileup, then I can probably filter the bam beforehand to keep only my read ID...
Solution2: if there was something like a distinguish_read
parameter in pileup(), I could simply look for my read in the output, but I understand that would not be really called 'piling up'...
Any idea?
Do you have to use samtools? Grepping the sam file seems the natural way to filter by read name, then a little awk to find the right letter in the read based on the map position.
*Rsamtools. I think yes. Or at least would simplify many things if I could probe the alignment/read from within my R script.
Read ID is called QNAME in SAM format.
ScanBam
seems to give access to the QNAME - maybe you can find out how to get that fromScanBamParam
?Hmm... Getting closer.
Does get me all the QNAMEs in my BAM file. But I cannot find how to import specific reads by QNAME. I could also get the index of the read I want, but I am not sure where to feed that either.
The
which
parameter sounded promising but looks like it's only for genomic positions.It sounds like
filterBam()
might allow me to write a whole bunch of single-read BAM files and import them back in, but evidently not ideal.