Entering edit mode
6.3 years ago
sahar850
•
0
Hi,
I need to convert data from TCGA in the form of ensembl gene id version to hgnc symbol using Biomat r package. After creating a data frame containing all the ensembl gene id,I tried this loop code:
for (i in 1:length(data[,1])) {
data[i,1] <- getBM(attributes=c('hgnc_symbol'),filters = 'ensembl_gene_id', values =
sub("\\..*", "", data[i,1]), mart = ensembl)
}
But I keep getting this error message:
Error in x[[jj]][iseq] <- vjj : replacement has length zero
I also tried this code:
hgnc_id <- getBM(attributes=c('hgnc_symbol'),filters = 'ensembl_gene_id_version', values = data[,1], mart = ensembl)
In this case I only get 15000 out of the 60000 genes
hgnc_id <- getBM(attributes=c('hgnc_symbol'),filters = 'ensembl_gene_id', values = sub("\\..*", "", data[,1]), mart = ensembl)
In this case I only get 30000 out of the 60000 genes
Anyone had a similar problem or can offer a solution?
Side note: It's
ensembl
, there's noe
at the end of the word.First off, I'd recommend using parameter names when you call functions, so commands are explicit. This is especially useful with the
sub
andgsub
, asx
,pattern
andreplacement
are really weirdly positioned in these functions.Does
sub(pattern="\\..*", replacement="", x=data[1:15,1])
give you the expected output in the expected format (vector)? I recall needing to usesapply
to get anunlist
ed vector of results fromgsub
.Tnx for the tip, i will add the parameters names (it's actually the first time i'm using R) To the current subject, the sub works fain, i just tried running the code on parts of the data and the error is given in the 532 element which is: ENSG00000036549.11 and ENSG00000036549 after the sub, really cant see why it stopped specifically there... all the element before it actually got the hgnc symbol.
i will try to use try catch so it will skip an index the ones who make this error pop out (if its possible i R...) but if someone have a better solution it will be helpful
What happens when you query with just
ENSG00000036549
? Compare that to a couple of calls made with different gene ids, and you should see where your code breaks.It was the first thin i did, i get the same error message listed above...
What is your R version?
my R version is 3.5.0
IMO 3.5 might not be mature yet - I've had problems working on 3.5 too. Can you try working on 3.4.1 maybe? You can use conda to install 3.4.1 without affecting your 3.5 installation:
Once done, you can check
which R
, ensure it points to the conda environment specific R and install bioconductor.