Hello!
I need to know how many coding genes are in the Y chromosome. The first i try to filter gtf file with R using this code
#load gtf file
gtf <- rtracklayer::import('~/lapd/Index_hum/ann/Homo_sapiens.GRCh38.104.gtf')
gtf_df=as.data.frame(gtf)
##filter gtf file
library(dplyr)
gtf_filt= filter(gtf_df, type=='gene', gene_biotype == 'protein_coding')
chrY=filter(gtf_filt, chromosome_name == 'Y')
Thus, i found 46 coding genes. I thought that i had mistakes and try bioMart:
library(biomaRt)
ensembl = useMart("ensembl", dataset= "hsapiens_gene_ensembl")
new=getBM(attributes=c("chromosome_name","ensembl_gene_id"), filters='biotype', values=c('protein_coding'), mart=ensembl)
chrY=filter(new, chromosome_name == 'Y')
And found 46 coding genes too. When I try to compare a number of protein-coding genes in ensemble statistics and my annotation file from ensemble i found the second trouble. In Ensembl statistics (http://www.ensembl.org/Homo_sapiens/Location/Genome?r=MT) 20,442 coding genes and in annotation files 19,937.
When do I have mistakes? Or its normal)
Y chromosome shows `64 protein coding genes.