Hi, I'm new to R and working through some existing code. I'm wanting to save a GRanges object to disk and then be able to reload it, preserving the metadata, so the object I restore to memory is the same as the one I saved to disk. The object I have in memory is as follows:
head(gr)
GRanges object with 6 ranges and 6 metadata columns:
seqnames ranges strand | TXID TXSTART TXEND
<Rle> <IRanges> <Rle> | <integer> <integer> <integer>
ENST00000456328.2 chr1 11869-14409 + | 1 11869 14409
ENST00000515242.2 chr1 11872-14412 + | 2 11872 14412
ENST00000518655.2 chr1 11874-14409 + | 3 11874 14409
ENST00000450305.2 chr1 12010-13670 + | 4 12010 13670
ENST00000473358.1 chr1 29554-31097 + | 5 29554 31097
ENST00000469289.1 chr1 30267-31109 + | 6 30267 31109
GENEID TXNAME TxWidth
<CharacterList> <character> <integer>
ENST00000456328.2 ENSG00000223972.4 ENST00000456328.2 1657
ENST00000515242.2 ENSG00000223972.4 ENST00000515242.2 1653
ENST00000518655.2 ENSG00000223972.4 ENST00000518655.2 1483
ENST00000450305.2 ENSG00000223972.4 ENST00000450305.2 632
ENST00000473358.1 ENSG00000243485.2 ENST00000473358.1 712
ENST00000469289.1 ENSG00000243485.2 ENST00000469289.1 535
-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths
I save it to a txt file as follows:
df = as(gr, "data.frame")
write.table(df, filename, sep = "\t", row.names = TRUE, col.names = TRUE, quote = FALSE)
I then read it back in as follows:
df2 = read.table(filename, header = TRUE, sep = "\t", dec = ".")
gr2 = makeGRangesFromDataFrame(df2, keep.extra.columns=TRUE)
head(gr2)
resulting in:
GRanges object with 6 ranges and 6 metadata columns:
seqnames ranges strand | TXID TXSTART TXEND
<Rle> <IRanges> <Rle> | <integer> <integer> <integer>
ENST00000456328.2 chr1 11869-14409 + | 1 11869 14409
ENST00000515242.2 chr1 11872-14412 + | 2 11872 14412
ENST00000518655.2 chr1 11874-14409 + | 3 11874 14409
ENST00000450305.2 chr1 12010-13670 + | 4 12010 13670
ENST00000473358.1 chr1 29554-31097 + | 5 29554 31097
ENST00000469289.1 chr1 30267-31109 + | 6 30267 31109
GENEID TXNAME TxWidth
<factor> <factor> <integer>
ENST00000456328.2 ENSG00000223972.4 ENST00000456328.2 1657
ENST00000515242.2 ENSG00000223972.4 ENST00000515242.2 1653
ENST00000518655.2 ENSG00000223972.4 ENST00000518655.2 1483
ENST00000450305.2 ENSG00000223972.4 ENST00000450305.2 632
ENST00000473358.1 ENSG00000243485.2 ENST00000473358.1 712
ENST00000469289.1 ENSG00000243485.2 ENST00000469289.1 535
-------
seqinfo: 25 sequences from an unspecified genome; no seqlengths
GENEID and TXNAME are now listed as 'factor'. Am I saving and loading correctly? If so how do I convert GENEID to 'CharacterList' and TXNAME to 'character' ?
For TXNAME it seems I can use: gr2$TXNAME = as.character(gr2$TXNAME)
But typeof(gr$GENEID)
says 'S4' and typeof(gr2$GENEID)
says 'integer' so I have no idea how to handle this.
If I have restored the same object to memory I would expect GENEID to be listed as 'CharacterList' and typeof to say 'S4'
Thank you
See
?saveRDS
.perfect! thank you, that's just what I need