I would like to get CDS information about coding sequences from a TxDb object but am struggling to find a good solution. It seems as if the TxDb object can hold so much information, but there is no way to extract the information outside of converting to a database object; this is troublesome because it increases computational time over manipulating a GenomicRanges object.
library('GenomicFeatures')
library(tidyverse)
Gencode <- makeTxDbFromGFF("./gencode.v39.basic.annotation.gff3.gz")
saveDb(Gencode, file="gencode.v39.basic.annotation.sqlite")
Gencode <- loadDb("gencode.v39.basic.annotation.sqlite")
#Get CDS by transcript:
CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)
In this command, it doesnt appear to be a way to name what information you want returned, you are stuck with:
seqnames ranges strand | cds_id cds_name exon_rank
Despite the fact that
> columns(Gencode)
Returns 22 different results......
I realize that I can use:
CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)
keys <- names(CDSbyTx)
cols <- columns(Gencode)
select(Gencode, keys = keys, columns = cols, keytype="TXNAME")
but this creates a data.frame where the CDS are not grouped by transcript, and this just seems really round about.
Any work around here?
Are you just trying to get a GRanges object with all CDSs and additional info like tx_id and such?
jon.klonowski Do not delete posts that have received feedback. Instead, interact with the people investing effort in your problem and if the problem resolved itself or their feedback helped, let everyone know.
It was a dumb question. Everything I needed was there I just got a little scatter brained @Ram
That's OK, it happens to everyone. Just leave a comment saying what now-so-obvious thing you missed and I guarantee you, someone else will miss the exact same thing and find that your comment saves them at least a few hours.