Entering edit mode
3.5 years ago
iibrams07
▴
10
I have a set of apparently hgnc_symbol assigned genes that I retrieved with biomart. It turns our that many of the corresponding ensembl id are missing i.e. replaced with NaN. How it can be that there are no ensembl id for so many genes? Is there a way to find them?
I used the following command in retrieving the data:
results <- getBM(attributes=c("ensembl_gene_id","hgnc_symbol","transcript_biotype"),filters = c("transcript_biotype"), values=list("protein_coding"), mart=ensembl)
Thanks
Different organisations have different rules about what is annotated or not. I am assuming that these genes in question have low evidence of even existing, or relate to obscure non-coding transcripts that may not yet have been reviewed by HGNC. Can you please paste some example IDs?