Hi everyone
This is the my first post and currently am learning Bioconductor
Does anyone know why there is a different between total number of genes for GENCODE Release 43 which is 62703 (https://www.gencodegenes.org/human/stats_43.html) and number of genes once extracted using Bioconductor as the following code
gtf_43<-rtracklayer::import('gencode.v43.primary_assembly.annotation.gtf')
dtgtf_44<-data.frame(gtf_43)
genes <- unique(obj_43[ ,c("gene_id","gene_name")])
nrow(genes)
The result is 62757
Thank you in advance
obj_43 is not defined in this code chunk which decreases my confidence in the result. Just count the unique gene IDs.
Hello, thanks for replying, by mistake I paste the wrong code. the following is the corrected
gtf_43<-rtracklayer::import('gencode.v43.primary_assembly.annotation.gtf') dtgtf_43<-data.frame(gtf_43) genes <- unique(dtgtf_43[ ,c("gene_id","gene_name")]) geneId <- unique(dtgtf_43[ ,c("gene_id")]) length(geneId) ## result 62757 nrow(genes) ## result
The following is the correct code
gtf_43<-rtracklayer::import('gencode.v43.primary_assembly.annotation.gtf') dtgtf_43<-data.frame(gtf_43) nrow(dtgtf_43) genes <- unique(dtgtf_43[ ,c("gene_id","gene_name")]) geneId <- unique(dtgtf_43[ ,c("gene_id")]) length(geneId) nrow(genes)