Question

R: extract interest nucleotide from BAM files (BS-Seq)

3

Entering edit mode

9.2 years ago

Shicheng Guo ★ 9.5k

Hi colleagues,

I want to extract the nucleotide (C or T) for each reads from BAM files (BS-Seq) in some interesting positions (CpG sites). Do you know any R script can do it with well speed?

As many colleagues mentioned that why I insist to use R. Sure, I can use perl and samtools to extract these informations, but It would be complicated to use many programming language. I know maybe be R would be slow to deal with this problem, but anyway, it is in the one platform and it would be easy for the user. Acutally, I have finished the Perl script to do it. I just want to transfer the code to R.

library("Rsamtools")
library("GenomicRanges")
library("GenomicAlignments")

setwd("/home/shg047/bam")
# quickBamFlagSummary("Indx16_S12.sorted.clipped.bam", main.groups.only=TRUE)
flag1 <- scanBamFlag(isFirstMateRead=NA, isSecondMateRead=NA,isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param1<-ScanBamParam(flag=flag1, what="seq")
gal1 <- readGAlignments("Indx16_S12.sorted.clipped.bam", use.names=TRUE, param=param1)
seq<-mcols(gal1)$seq
is_on_minus <- as.logical(strand(gal1) == "-")
seq[is_on_minus] <- reverseComplement(seq[is_on_minus])

bf <- BamFile("Indx16_S12.sorted.clipped.bam", asMates=F)
paln <- readGAlignmentsList(bf)
j <- junctions(paln, with.revmap=TRUE)
roi <- GRanges("chr1", IRanges(10056, width=500))
findOverlaps(paln,roi)
j_overlap <- paln[paln %over% roi]
paln[j_overlap$revmap[[1]]]

Best regards

alignment • 2.6k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by Shicheng Guo ★ 9.5k

1

Entering edit mode

Do you absolutely have to do this with R? It's great for some things, but terrible for things like this.

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

R is intended for statistical data analysis. You can use samtools mpileup to get the base calls for a base position

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by Irsan ★ 7.8k

Ram · Answer 1 · 2015-09-29

0

Entering edit mode

9.2 years ago

glihm ▴ 660

Hi Shicheng Guo,

like posted by Devon Ryan, you have lot's of opportunities in order to do that. The thing is: "What orders are?" :D

You have to do it with R -> This package from bioconductor will be helpful!
You want to try an other solution, perhaps you have some knowledge with Python -> pysam is your man.
With other languages, you can use SamTools in order to extract sequences and then process them with bash scripts for example or languages like C -> samtools.

Depending on the R bioconductor package performance, other programming languages may be faster and easier to extend/develop than R scripts.

However, if you are a pro R, enjoy it with Rsamtools package!

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by glihm ▴ 660

1

Entering edit mode

More specifically, if R isn't needed then PileOMeth might be the simplest tool.

ADD REPLY • link 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

I did not know this one, thank you!

ADD REPLY • link 9.2 years ago by glihm ▴ 660