Currently I am using Biopython's Esearch to get the list of papers for a searchterm. Unfortunately I'm getting different results when I compare them to the web search results. I have already tried to use the sort function but it does not help. The total number of search results also differs.
For Example if I am using the following code to search for "sclerosis":
from Bio import Entrez
def search(query):
Entrez.email = 'example@mail.com'
handle = Entrez.esearch(db='pubmed',
sort='pub date',
retmax='10',
retmode='xml',
term=query)
results = Entrez.read(handle)
print(results['Count'])
return results
def fetch_details(id_list):
ids = ','.join(id_list)
Entrez.email = 'example@mail.com'
handle = Entrez.efetch(db='pubmed',
retmode='xml',
id=ids)
results = Entrez.read(handle)
return results
if __name__ == '__main__':
results = search('sclerosis')
id_list = results['IdList']
papers = fetch_details(id_list)
for i, paper in enumerate(papers['PubmedArticle']):
print("%d) %s" % (i + 1, paper['MedlineCitation']['Article']['ArticleTitle']))
Output-
- Therapeutic potential of neuromodulation for demyelinating diseases.
- Astaxanthin Reduces Demyelination and Oligodendrocytes Death in A Rat Model of Multiple Sclerosis.
- AI-Based Methods and Technologies to Develop Wearable Devices for Prosthetics and Predictions of Degenerative Diseases.
- RNA Editing in Neurological and Neurodegenerative Disorders.
- Neuromuscular junction mitochondrial enrichment: a "double-edged sword" underlying the selective motor neuron vulnerability in amyotrophic lateral sclerosis.
- Fused in sarcoma-amyotrophic lateral sclerosis as a novel member of DNA single strand break diseases with pure neurological phenotypes.
- Mending the broken in amyotrophic lateral sclerosis: DNA damage and repair in motor neuron degeneration.
- Cognitive impairment in multiple sclerosis: lessons from cerebrospinal fluid biomarkers.
- Reorganization of multiple sclerosis health care system in Clinical Centre of Montenegro during the COVID-19 pandemic.
- Mélange intéressante: COVID-19, autologous transplants and multiple sclerosis.
However I try to sort the results on the websearch or on my code below but I don't get the same results.
I am not sure if there is anything you can do about that. I see that a search with "sclerosis" via web brings
162,112 hits
(as of now) but if I do the search via EntrezDirect I see161858
hits. It is possible that the database searched by the webpage is newer.What is the ultimate aim of your search? You could achieve what you need to with right combination of search terms.
That is exactly my problem. The goal is to use lda topic modeling to identify latent topics. For this purpose it would be perfect to get all available papers. I already had the same thought with the actuality of the data. But I have not yet found a way to prove this.