How To Get Equal Numbers Of Gene_Id And Exon_Id For Use In Newexoncountset Function For Dexseq
0
2
Entering edit mode
12.6 years ago
buchnerd ▴ 70

I'm trying to run DEXseq but am having trouble with the newExonCountSet function because my geneid list and exonid list are different sizes. When I unlist the mm10 geneid dataset from UCSC annotated with Ensembl Genes (ensGene) it yields 351,862 geneids even though the corresponding exonid dataset contains only 350,630 IDs. I had a similar problem when trying to annotate the UCSC database with RefSeq Genes (tablename = 'refGene'). Any thoughts on why there is a difference and how I can get the GeneID and Exon_ID datasets to match up? Thanks!

library("GenomicRanges")
library("Rsamtools")
library("GenomicFeatures")
library("rtracklayer")

txdb <- makeTranscriptDbFromUCSC(genome="mm10", tablename = "ensGene")

#Getting rid of strand information
exonRangesList <- exons(txdb)
strand(exonRangesList) <- '*'
exonRangesNoStrand <- split(exonRangesList)

ExonIDs <- exons(txdb, vals=NULL, columns=c('gene_id', 'exon_id'))
GeneIDList <- elementMetadata(ExonIDs)
GeneIDVector <- GeneIDList$gene_id
ExonIDVector <- GeneIDList$exon_id
GeneIDVector2 <-unlist(GeneIDVector)
rna-seq • 4.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1292 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6