From ~22144 human genes with refseq,ensembl,ucsc known id, gene symbol, and description for RNA-seq data, I am trying to get information of percentage for each non-coding small RNA group (snoRNA, snRNA, scRNA, piwiRNA,micro RNA). Since there are gene symbol and description for each gene, therefore, if i take a look at genes, roughly i am able to get where particular gene belongs to. But i would like to get an exact percentage how many genes belong to snoRNA, how many genes to snRNA...etc.
how can i get such information?
And by just looking at the ~2100 genes, how would i know which gene is for which catogory, like i know all genes starting with MIR is all microRNAs. So what about others?
from ensemble biomart, download the gene symbol and its biotype which gives you the result. Same thing can be found in the feature file also (gtf/gff)
thank you prasad for the reply. But can you kindly explain a bit more how to do this. As that is something new for me.
GO to ensemble
biomart → Ensemble Genes → Select Organism → Attribute
, from the attribute section select what are all the information you want. Gene_Type is the one you are looking.