Entering edit mode
3.5 years ago
wrab425
▴
50
I did a whole series of alignments of bam files to my genome using tools in the following R libraries in Bioconductor
library(GenomicRanges)
library(rtracklayer)
library(Rsamtools)
library(limma)
library(GenomicAlignments)
I used the following commands to create coverage vectors:
Cov_1 <- coverage(alignGR_1)
smoothed the vectors and plotted them out.:
plot(smoothCov_1_a, col="red", type="l", ylab="Number of reads", xlab="bp from range start")
All worked well.
I exported the coverage vectors so that I could analyse them with respect to sequence features but when I look inside them I did not find numbers but unicode characters. Seems like a silly mistake but I am lost as to what it might be?
What was the function call you invoked to export the data?
Thanks Dunois; I established that I had been working with GenomicRanges Rle object and needed to convert it to a numerical vector which using as.numeric() and after that I simply used the write() command to recover it as a simple text vector outside R.
I don't think
as.numeric()
would have worked had you passed it theRle
object directly. Did you get aNAs introduced by coercion
warning when you tried?I think the right way would probably be
as.numeric(decode(x))
or something to that effect (wherex
is theRle
object). Or potentiallyas.numeric(as.vector(x))
. I think this would depend on what is contained in the object. If you could give us an indication of that, it'd be helpful.I suggest reading through the help documentation for
Rle
objects (?GenomicRanges::Rle
). Specifically the From Rle to other objects subsection under the section Coercion.Passing the Rle object to as.numeric() worked fine with no warnings or NAs introduced by coercion . I looked inside the Rle object and it was just a list of unicode characters. The output of as.numeric() is just a list of numbers. I am happy to send you anything you would like to see; please just let me know!
That sounds a bit odd. To the best of my knowledge,
as.numeric()
cannot handle Unicode characters. I can't really tell what's going on. Could you share some anonymized code and dummy data perhaps? Would be nice if you could cover everything including the smoothening function you mentioned in the OP.Load the libraries required:
library(GenomicRanges) library(rtracklayer) library(Rsamtools) library(limma) library(GenomicAlignments)
Create a BamFileList object containing details of the BAM files we wish to analyse
dataDir <- "directory_name" bamFiles <- dir(file.path(getwd(), dataDir), pattern="*filtered.sorted.bam$", full.name=T) names(bamFiles) <- gsub(".bam","",basename(bamFiles))
bfList <- BamFileList(bamFiles)
Extract the sam header for a single bam file and examine the structure of the list using the str() function
samHeader_1 <- scanBamHeader(path(bfList["File_name"])) str(samHeader_1,max.level=2)
samHeader_2 <- scanBamHeader(path(bfList["File_name"])) str(samHeader_2,max.level=2)
Explore the information provided in the sam header.
samHeader_1[[1]]$targets
samHeader_2[[1]]$targets
Extract information about chromosome 2 coordinates to set up filtering parameters using the ScanBamParam() function.
chrdat_1 <- samHeader_1[[1]]$targets["File_name"]
chrrange_1 <- GRanges(seqnames=names(chrdat_1),ranges=IRanges(1,chrdat_1)) param_1 <- ScanBamParam(which=chrrange_1)
chrdat_2 <- samHeader_2[[1]]$targets["File_name"]
chrrange_2 <- GRanges(seqnames=names(chrdat_2),ranges=IRanges(1,chrdat_2)) param_2 <- ScanBamParam(which=chrrange_2)
Having selected an area of interest, read in read data from a single BAM file using the parameters set using the readGAlignments() function. Convert this to a GenomicRanges object and inspect the data contained within. Calculate the average read length of the selected reads (in case some reads were trimmed).
alignDat_1 <- readGAlignments(path(bfList["File_name"]),param=param_1) alignGR_1 <- granges(alignDat_1) seqlevels(alignGR_1) <- "File_name"
alignDat_2 <- readGAlignments(path(bfList["File_name"]),param=param_2) alignGR_2 <- granges(alignDat_2) seqlevels(alignGR_2) <- "File_name"
Use the coverage() function to create coverage vectors.
strand(alignGR_1) Cov_1 <- coverage(alignGR_1)
strand(alignGR2) Cov_2 <- coverage(alignGR_2)
Extract coverage values for the region of interest.
coord <- c(123301:129401)
the fllowing are normalizations
Cov_1_a<- Map("/", Cov_1, 201035.1129) Cov_2_a<- Map("/", Cov_2 , 257977.1543)
subCov_1_a <- Cov_1_a$File_name[coord] subCov_2_a <- Cov_2_a$File_name[coord]
Thank you for sharing the code with me. I think you just pasted the code above (instead of inside) the code markdown area denoted by a `
` and \``` respectively: the place where it says
enter code here` in your post right now.So I ran some "dummy" data through your code, and it seems to work fine as far as I can tell. The data I used is from here. The
coverage()
function worked. I did not get any Unicode characters in the vector.As far as I understand,
coverage()
produces aRleList
ofRle
elements (so a list of run vectors; see here). The accessor functionsrunLength()
andrunValue()
could be used with anRle
object to create adata.frame
of the run vector like so:The data.frame could then be exported to file with an appropriate file name and all that jazz. I have a small function down here that automates this for every
Rle
element in aRleList
produced bycoverage()
:Is this what you were hoping to do?
Thanks Dunois, I have been away writing my paper so have not had the chance to thank you. I ran the as.numeric and it came out as a file in which the numbers were clustered in groups of five. I pulled these apart using a PERL script and got what I needed but it was messy. I know need these vectors for GEO so I shall try what you suggest. I have many files to crack through and will let you know how it goes. William
No worries, and good luck. I hope it goes well!! Looking forward to hearing that it all worked out!!
I am having trouble getting the formatting and hope that you can see the key lines. If you need bam file then just ask but they are huge.
I looked at https://web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/GenomicRanges/html/GRanges-class.html and there was no mention of a need for decode(x) under the entry for coercion.
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.Thanks; I did not know about this. I will use it in future.