Question

chromosome name in summarizeOverlaps for RNASeq data

0

Entering edit mode

8.9 years ago

BioProg • 0

For a while I used quantile normalization followed by t-test or cuffdiff for analysis of RNASeq. I would like to try DESeq2 given its better normalization method based on the readings. I am using your R package summarizeOverlaps to create count tables as I found it to be the easiest. However, I am encountering couple of issues. First,the ensemble-based gtf file that I used to align all my mouse RNAseq data has the chromosomes named as "1, 2 , 3"... Rather than "chr1 , chr2 ..". My Testing R code for summarizeOverlaps is as follows:

library(TxDb.Mmusculus.UCSC.mm10.ensGene)

TxDb_mm10=TxDb.Mmusculus.UCSC.mm10.ensGene
saveDb(TxDb_mm10, file="mm10.sqlite")
mm10<-loadDb("mm10.sqlite")
exbyGene<-exonsBy(mm10,by="gene") #I assume this is assigning things by GENES ids (ENSG..)
exbyTx<-exonsBy(mm10,by="tx") #I assume this is assigning things by Transcript IDS (ENST…)
fls <- list.files("Path/To/BamFiles", pattern=".accepted_hits.bam", full= TRUE)
grp1<- fls[1]
grp2 <- fls[c(2,3)]
bamLst_grp1<-BamFileList(grp1,yieldSize=100000)
bamLst_grp2<-BamFileList(grp2,yieldSize=100000)
head(seqlevels(TxDb_mm10)) # this shows that the mm10 sqlite I am using has chr1, chr2 .. Format
seqinfo(bamLst_grp1) #this shows that the chromosome names for my testing bam files are 1, 2, 3...
se_grp1<-summarizeOverlaps(exbyGene,bamLst_grp1,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE)
se_grp2<-summarizeOverlaps(exbyGene,bamLst_grp2,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE)
##Saving environment
saveRDS(se_grp1, file="se_grp1.Rdata")
saveRDS(se_grp2, file="se_grp2.Rdata")

While the script runs nicely (and please, advice me if I should optimize anything as I have basic understanding of each of the commands), when I tried to load the environment I am getting the following error:

Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file 'se_grp2.Rdata' has magic number 'X'
  Use of save versions prior to 2 is deprecated

Also, I am not sure if the error is because the chromosome names are incompatible between the reference and bam file. If that's the case, how can I correct for this?

RNA-Seq summarizeOverlaps annotation DESeq2 • 2.3k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 8.9 years ago by BioProg • 0

score 1 · Answer 1 · 2016-01-12

1

Entering edit mode

8.9 years ago

Devon Ryan 104k

The error is unrelated to any chromosome names and has to do with R not being able to load the file it saved. Presumably there was a (hopefully transient) storage problem.

ADD COMMENT • link 8.9 years ago by Devon Ryan 104k

0

Entering edit mode

You are right I should have used readRDS as i saved file as RDS.

Thank you

ADD REPLY • link 8.9 years ago by BioProg • 0