Using Taxonomizr on R with a list of input.
0
0
Entering edit mode
4.1 years ago

Hello all !

I am totally new to R and I'm trying to run a function from a specific package (taxonomizr) : getId.

It works like this :

data<-getId(c('Pestivirus A','Bos taurus','Homo'),taxaNames)

and as an output you get a table with one row with all the id associated to each species.

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

'Pestivirus A','Bos taurus','Homo'...

I have tried to copy paste all the .txt file in the argument of the getId function. But when I run the command, nothing happens and I have symbol + instead of > in the console. Copy pasting only works for maximum about hundred of inputs (out of 7000).

Is there a way to use .txt file to avoid doing 70 copy pastes ?

I add it's the first time I am using R. So far I've imported my data like this and that's all

   species <- read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)
R software error • 1.4k views
ADD COMMENT
0
Entering edit mode
specieds <- as.list(read.csv("list.txt", header=FALSE))
data<-getId(specieds, taxaNames)
ADD REPLY
0
Entering edit mode

Ok I get as a reult : Error in out[taxa] : type 'list' d'indice incorrect

have used

 specieds <- as.list(read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE))
 data<-getId(specieds, taxaNames)
ADD REPLY
0
Entering edit mode
 df = read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)
 species <- as.vector(df[[1]])
 data<-getId(species, taxaNames)
ADD REPLY
0
Entering edit mode

Ok I just did that but I have nothing as output.

My .txt file is :

'Nannochloropsis gaditana','Hondaea fermentalgiana','Nannochloropsis gaditana','Hondaea fermentalgiana','Hondaea fermentalgiana'

Print (df) is :

                            V1                       V2                         V3                       V4
1 'Nannochloropsis gaditana' 'Hondaea fermentalgiana' 'Nannochloropsis gaditana' 'Hondaea fermentalgiana'
                        V5
1 'Hondaea fermentalgiana'

Print (species) is :

"[1] "'Nannochloropsis gaditana'"
ADD REPLY
0
Entering edit mode

You should show some lines of input .txt file at the very beginning. How may lines in it?

df = read.csv("t.txt", quote="'",  header=FALSE)
species = as.vector(t(df[1, ])) # 1 means the first row.

If there are more than one lines, use loop.

ADD REPLY
0
Entering edit mode

It is the only line. I tried the command with this small subset, noting more. Could the problem come from the  ?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

The problem does not only come from the enconding. I did df[1] <- NULL to delete this "bad variable". and it stills doesnt work. However, when I do copy paste this subset, it works. EDIT :actually it works for the first species of my.txt file (after deleting the encoding).

> print (species)
[1] "Hondaea fermentalgiana"
ADD REPLY
0
Entering edit mode

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

Seems the one-line format is not the original format, one-name-per-line could be the easiest and most convenient for downstream processing.

And 7000 is not a small number, you can try taxonkit name2taxid for mapping scientific name to TaxIDs or further retrieve lineage, which supports windows but you need run in command line console.

ADD REPLY

Login before adding your answer.

Traffic: 1731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6