Hi,
I am trying to add a row of gene id next to my rows of gene symbols. normlaised_mouse3 is my count matrix for rna-seq analysis. I've used to code:
library('biomaRt')
normalised_mouse_biomart <- read.delim("normalised_mouse3.txt")
mart <- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
Genes <- normalised_mouse_biomart$V2
ensLookup <- gsub("\\.[0-9]*$", "", Genes)
G_list <- getBM(filters= "ensembl_gene_id",
attributes= c("ensembl_gene_id" , "mgi_symbol"),
values= ensLookup,
mart= mart)
mouse <- cbind(G_list$mgi_symbol, normalised_mouse_biomart)
G_list contains the correct list of gene id and gene symbol but there are error messages when merging the gene ID and count matrix:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 54439, 54446
I don't understand how 7 gene names are missing from the G_list when I've given the correct number of gene symbols.
Well for starters, how many genes are in ensLookup? How many unique genes in G_list? Is there any possibility that the gsub is mangling a few gene names?
If you want to store meta-data for your genes alongside read-counts for those features in your samples, you might be better storing your data in a
edgeR::DGEList
.