Entering edit mode
12.6 years ago
buchnerd
▴
70
I'm trying to run DEXseq but am having trouble with the newExonCountSet function because my geneid list and exonid list are different sizes. When I unlist the mm10 geneid dataset from UCSC annotated with Ensembl Genes (ensGene) it yields 351,862 geneids even though the corresponding exonid dataset contains only 350,630 IDs. I had a similar problem when trying to annotate the UCSC database with RefSeq Genes (tablename = 'refGene'). Any thoughts on why there is a difference and how I can get the GeneID and Exon_ID datasets to match up? Thanks!
library("GenomicRanges")
library("Rsamtools")
library("GenomicFeatures")
library("rtracklayer")
txdb <- makeTranscriptDbFromUCSC(genome="mm10", tablename = "ensGene")
#Getting rid of strand information
exonRangesList <- exons(txdb)
strand(exonRangesList) <- '*'
exonRangesNoStrand <- split(exonRangesList)
ExonIDs <- exons(txdb, vals=NULL, columns=c('gene_id', 'exon_id'))
GeneIDList <- elementMetadata(ExonIDs)
GeneIDVector <- GeneIDList$gene_id
ExonIDVector <- GeneIDList$exon_id
GeneIDVector2 <-unlist(GeneIDVector)