I have a file called species.txt which I edited to only contain a list of species from blast output. Literally just one column with species names which include genus and species separated by a space.
My goal is to just loop through that file (each line is a species), and obtain its taxonomy rankings using tax_name(species, get=c("superkingdom","kingdom","phylum","order","family","genus","species"), db="ncbi")
Afterwards I would grab whichever taxonomy ranking I desire for each species and create some tables with that information.
Is there a simple way to do this in R without using a for loop? I think lapply would work but I am unable to figure out how to read in the file without having to write a for loop (which I am terrible at); and some files will have thousands of species in the list so a for loop would not be efficient in my opinion.
I have read in my file using readLines, but still unsure how to actually turn this into function with lapply. Is it possible to avoid a for loop here at all?
Thanks!
I can't quite tell all the steps of what you're trying to do, but to read in a 1 column file, and loop through the entries using lapply, you could do something like the following:
Even though your file is NOT tab separated, set the separator so the space between genus and species will be ignored, and treated as one row per line.
I tried that, but I get this error which I don't understand:
Error: sci_com must be of class character, taxon_state
But I don't get why this is an issue when I try to call the species from a list. If I type in their name individually instead ofx
,tax_name
works fine. When I dostr(species)
it says it is a dataframe, factor with 52 levels.Nevermind, just adding the
as.character(x)
in tax_name seemed to have worked to remove that error, however nothing is being printed. It just says that the species name is "found" and goes through them all but I don't see the lineage information popup as I was hoping to save that info separately.Ok, so your using a package that contains an actual function called tax_name(). Looking this up, I see that it returns a data.frame. So if you want to capture that value, you could assign it to a variable and do something with it. Either process it further, or simply return it.
or
Also, in the original call to read your species.txt file, you might add an argument which prevents values from being treated as factors:
I ended up switching to python and writing my own code to grab the lines. There were too many issues with this package when working with bacteria sequences - a lot of the names just weren't able to be found and this package doesn't use taxids to search. Thanks though I think your approach would have worked had the extraction process actually worked for me with the names I had!