Piping samtools to R
1
0
Entering edit mode
7 months ago
noodle ▴ 590

Hi all,

I'm wondering if it's possible to stream the output of samtools mpileup into R for processing. If someone has done this, can you share the bits of code for accepting the stdin and writing the stdout?

Likewise, if I'm imaging that I can pipe this into R please let me know.

The motivation behind this is to process the data to look for exactly what I'm after without creating excessively large intermediate mpileup files.

Thanks!

R NGS bash samtools • 1.0k views
ADD COMMENT
0
Entering edit mode

Yes, possible. But before we go into details, what analysis do you want to do. Almost certainly tjere is a more efficient command line tool that does it faster and with less memory.

ADD REPLY
0
Entering edit mode

Calculate the error rate (number of bases, indels, starts or stops) over a user-specified window...sure, there are tools I could try to make useful here but I'd rather just take the direct output from samtools and write my own function for this.

ADD REPLY
2
Entering edit mode

(cough) (cough) (cough) don't use R for this (cough) (cough ) (cough)

ADD REPLY
0
Entering edit mode

haha, you should have that cough checked out ;)

Should my take-away be that R can't handle streams? Or something else? Any scripting language that might work? (I'm not going to learn c++ for this...)

ADD REPLY
0
Entering edit mode

(I'm not going to learn c++ for this...

are you learning a new language ?, great

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

You need a variant caller to reliably call indels and things like that, that has tested heuristics and cutoffs to distinguish signal from noisy calls. I would run this through a variant caller and then process the VCF file rather than the pileup itself.

ADD REPLY
0
Entering edit mode

I get that I could process a vcf file, but my question seems like something so simple and achievable ...guess not.

ADD REPLY
0
Entering edit mode

I actually found this to be bad (outdated?) advice. I had no problem to pipe the output of samtools mpileup into an Rscript (R > 4.1) that loaded packages and functions, had intermediate file structures, and streamed to stdout. Copying a skeleton script below which can be called with samtools mpileup -l my.bed -f my.fa my.bam | Rscript my.R > R.stdout.tsv

Again, my goal here was to have a work-around for creating large intermediate files.

func.1 <- function(in.1){...}
func.2 <- function(out.1){...}
func.3 <- function(out.2){...}

f <- file("stdin")
open(f)
while(length(line <- readLines(f,n=1)) > 0) {

    this.chr <- unlist(strsplit(line, "\t")[[1]][1])
    this.coor <- unlist(strsplit(line, "\t")[[1]][2])
    this.ref <- unlist(strsplit(line, "\t")[[1]][3])
    this.depth  <- unlist(strsplit(line, "\t")[[1]][4])
    this.pileup  <- unlist(strsplit(line, "\t")[[1]][5])

    out.1 <- func.1(this.pileup)
    out.2 <- func.2(out.1)
    out.f <- func.3(out.2)

cat(out.f, sep = "\n")

}
ADD REPLY
1
Entering edit mode
7 months ago
noodle ▴ 590

This is straightforward. Can perform functions on multiple lines at a time if they are loaded as such. Skeleton Rscript below, which can be called as a pipe with samtools mpileup -l my.bed -f my.fa my.bam | Rscript my.R > myR.stdout

func.1 <- function(in.1){...}
func.2 <- function(out.1){...}
func.3 <- function(out.2){...}

f <- file("stdin")
open(f)
while(length(line <- readLines(f,n=1)) > 0) {

  this.chr <- strsplit(line, "\t")[[1]][1]
  this.coor <- strsplit(line, "\t")[[1]][2]
  this.ref <- strsplit(line, "\t")[[1]][3]
  this.depth  <- strsplit(line, "\t")[[1]][4]
  this.pileup  <- strsplit(line, "\t")[[1]][5]

  out.1 <- func.1(this.pileup)
  out.2 <- func.2(out.1)
  out.f <- func.3(out.2)

  cat(out.f, sep = "\n")

}
ADD COMMENT

Login before adding your answer.

Traffic: 2175 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6