For a while I used quantile normalization followed by t-test or cuffdiff for analysis of RNASeq. I would like to try DESeq2 given its better normalization method based on the readings. I am using your R package summarizeOverlaps to create count tables as I found it to be the easiest. However, I am encountering couple of issues. First,the ensemble-based gtf file that I used to align all my mouse RNAseq data has the chromosomes named as "1, 2 , 3"... Rather than "chr1 , chr2 ..". My Testing R code for summarizeOverlaps
is as follows:
library(TxDb.Mmusculus.UCSC.mm10.ensGene)
TxDb_mm10=TxDb.Mmusculus.UCSC.mm10.ensGene
saveDb(TxDb_mm10, file="mm10.sqlite")
mm10<-loadDb("mm10.sqlite")
exbyGene<-exonsBy(mm10,by="gene") #I assume this is assigning things by GENES ids (ENSG..)
exbyTx<-exonsBy(mm10,by="tx") #I assume this is assigning things by Transcript IDS (ENST…)
fls <- list.files("Path/To/BamFiles", pattern=".accepted_hits.bam", full= TRUE)
grp1<- fls[1]
grp2 <- fls[c(2,3)]
bamLst_grp1<-BamFileList(grp1,yieldSize=100000)
bamLst_grp2<-BamFileList(grp2,yieldSize=100000)
head(seqlevels(TxDb_mm10)) # this shows that the mm10 sqlite I am using has chr1, chr2 .. Format
seqinfo(bamLst_grp1) #this shows that the chromosome names for my testing bam files are 1, 2, 3...
se_grp1<-summarizeOverlaps(exbyGene,bamLst_grp1,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE)
se_grp2<-summarizeOverlaps(exbyGene,bamLst_grp2,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE)
##Saving environment
saveRDS(se_grp1, file="se_grp1.Rdata")
saveRDS(se_grp2, file="se_grp2.Rdata")
While the script runs nicely (and please, advice me if I should optimize anything as I have basic understanding of each of the commands), when I tried to load the environment I am getting the following error:
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file 'se_grp2.Rdata' has magic number 'X'
Use of save versions prior to 2 is deprecated
Also, I am not sure if the error is because the chromosome names are incompatible between the reference and bam file. If that's the case, how can I correct for this?
You are right I should have used readRDS as i saved file as RDS.
Thank you