Entering edit mode
6.2 years ago
Medhat
9.8k
I was working with R trying to get organism name (description):
> handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "xml")
# used also entre_search
With fetch I get empty list with search I get an ID
When I use Python.
handle = Entrez.efetch(db="nucleotide", id="NC_001479.1", rettype="gb", retmode="text")
x = SeqIO.read(handle, 'genbank')
print(x.Description)
#result: 'Encephalomyocarditis virus, complete genome'
What I am doing wrong with R?!
Thanks
Ok , I think I need to use also rettype = "gb" with R then process the results.
At the end I wrote this function:
retrive_title <- function(gi){
handel <- entrez_fetch(db = "nucleotide", id = gi, rettype = "gb", retmode = "xml")
xml_handel <- read_xml(handel)
xml_text(xml_find_all(xml_handel, "//GBSeq_organism"))
}
example:
retrive_title("NC_023021.1")
[1] "Formica exsecta virus 1"
Is this question resolved?
Not yet, I still need to find out how to extract the name from the result, or maybe there is a better solution than the one I wrote.
Using NCBI unix utils:
What you are asking for is under heading
Title
notOrganism
. Difference demonstrated above.Thanks I will try to apply that on Python or R
unfortunately, it did not fix the main issue: using
gives empty list using
give a bulk string that is not easy to handle (away from if it is title or organism)
If you are going to use
xml
then:With
gb
because I am In R I used the following (feel free to advice me with something better):
The output will be: [1] "Encephalomyocarditis virus"
I think there should be something easier than this