I am currently working on RNA-seq data. One of the aims is to scan intergenic regions for transcription of e.g. small RNAs, missing transcripts. I have aligned to the reference genome and filtered the reads (bwa + blat) and can compute overlaps and coverage in R/Bioconductor and IRanges, that works. The problem is methodical: how to best detect a peak in the sequencing coverage in an intergenic region. My procedure now is a rather naive:
- Find regions of coverage > arbitrary threshold (say 30), with width > min.width (e.g. 30 bases)
Remove all regions overlapping with genes/coding sequences(maybe + some 10 bp for promoters/UTR)
I am doing this using
views()
,coverage()
, andfindOverlaps()
from the IRanges package in R.
That way I can find something, but the point I do not like is the arbitrary cutoff, because of this I am also missing a lot of candidate peaks or if I set the threshold too low I will get lots of false positive. The data is not strand specific, and I have different samples but no biological replicates.
Do you have a better suggestion for selecting a possibly dynamic cutoff or for finding peaks in sequencing data, doesn't have to be in R, just a hint to an algortihm will do.
Thanks Sean, I will definitely look into this tool, too, though the 'documentation' is very rudimentary, but that might improve. After taking a quick glance, it seems to me that that Scripture contains nothing related my problem I couldn't do easier with R/Bioconductor, but maybe you can point this out?