Entering edit mode
5.0 years ago
flogin
▴
280
I'm working in a little script to access NCBI nucleotides section with an ID list and recovery the information about host of each ID.
So I write this:
import requests as req
link_nucleotide = 'https://www.ncbi.nlm.nih.gov/nuccore/'
lst_terms = ['MG873553.1','MG873552.1','MG873551.1','MG873550.1','MG251660.1','MG251659.1','MG251658.1','MG251657.1','KX650071.1','KX650070.1']
for i in lst_terms:
link_id = link_nucleotide+i
response = req.get(link_id)
my_file = response.text
print(my_file)
But, when I read the output, it does not exist any filed called "/host=", as we can see in https://www.ncbi.nlm.nih.gov/nuccore/MG873553.1 (/host="Elymana sulphurella", for the fist ID).
So, there is another form to access the text of this url to recovery the host informaton?
Best.
If you need to use Python, I suggest using BioPython, specifically the Entrez module. The documentation and the numerous posts in biostars should get you up and running.
If python is not a requirement, you should check out Entrez Direct to download NCBI data from the command line.
Thanks vkkdali, I'm using this module now, everything is working well, but just a silly question:
I'm using this code:
That returns:
How can I read this output to access specific ranks? for example: phylum = Arthropoda order = Hemiptera
thanks
I'd do something like this:
thanks vkkodali, I put in this form to recovery the taxonomic levels that I want:
This is perfectly fine but does not scale well if you have a whole bunch of ranks for which you need to collect data. Here's an alternative: