Hi,
Biostars. I have a list of genes with Ensembl gene ids and I want to filter out non-coding genes and get protein-coding genes. I am using Bioconductor package BiomaRt, but can't find a direct way. Any suggestions?
Many thanks!
Hi,
Biostars. I have a list of genes with Ensembl gene ids and I want to filter out non-coding genes and get protein-coding genes. I am using Bioconductor package BiomaRt, but can't find a direct way. Any suggestions?
Many thanks!
I used Transcript Biotype in the attribute to check if it is actually protein coding or not. It seems pretty straightforward using Biomart.
Can you post the R script you used to figure this out? I am having a terribly hard time trying to do something similar here.
Use the biomaRt vignette. If you have ids as ensembl gene ids it is pretty easy. Let x be a character array with your ensembl gene ids (with and without version information)
goids = getBM(attributes = c('ensembl_gene_id', 'gene_biotype'),
filters = 'ensembl_gene_id',
values = x,
mart = ensembl)
Below is the information for gene biotypes.
https://useast.ensembl.org/info/genome/genebuild/biotypes.html
Also I think you should add your post as comment rather than answer.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you use the R script you used to figure this out? I am trying to do something similar now and am having trouble.