Entering edit mode
3.6 years ago
fifty_fifty
▴
70
I have to convert the gene names in my scRNA-seq data into ensembl IDs for downstream analyses. I used biomaRt package which converted some of the gene names:
library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
biomart_hgnc <- getBM(attributes = c("hgnc_symbol", "ensembl_gene_id"),
filters = "hgnc_symbol",
values = rownames(LeeCRCtumor), bmHeader = T, mart = ensemble)
However, it returns several ensemble ids for one gene like here:
should I specify the gene location/chromosome in this case?
See: How to deal with the case that one gene symbol matches multiple ensembl ids?
yes, I understand that one gene can have several ensemble ids. But in my case, I have this single-cell RNA seq count matrix with gene names which I got from NCBI database. I don't have any fastq files or anything raw. I am trying to find a way to convert the gene names to ensemble ids. So, I think I need to restrict the biomaRt mapping somehow that the genes should not be in haplotypic regions. I was wondering if biomaRt has that functionality.
For the two examples above:
197953 is the main gene.
261846 is the
alternate
sequence gene.So you could filter your lists to restrict genes on main chromosome.
yes, I filtered out the genes that are not on the main chromosomes. I used ensembldb and several filters of biomaRt subsequently. However, I have some remaining genes that were not recognized by those methods. A lot of them start with RP11. I couldn't find some of them at all, e.g. CH17-212P11.4. Do you know how to convert them into ensemble id?