Entering edit mode
2.2 years ago
Arnau
•
0
Hello! I was hoping someone here could help me with something. I have been using this code to scrape metadata from articles written by certain authors. However, I have been encountering some difficulty trying to scrape MeSH terms (as you will see, the code for the mesh_terms is incomplete. I will still provide the code that did not work for me). Would anyone know or have any suggestions for how to scrape MeSH terms with the code given here?:
def capture_abstracts():
doctor_names_filehandle = codecs.open("doctor_names.txt", "r", "utf-8")
doctor_articles_filehandle = codecs.open("doctor_articles.txt", "w", "utf-8")
for doctor_name in doctor_names_filehandle.readlines():
doctor_name = doctor_name.strip()
time.sleep(1)
id_list = search(doctor_name)
if id_list:
id_details = fetch_details(id_list)
if id_details:
pubmed_articles = id_details['PubmedArticle']
for pubmed_article in pubmed_articles:
try:
pmid = pubmed_article['MedlineCitation']['PMID']
except:
pmid = ""
try:
article_title = pubmed_article['MedlineCitation']['Article']['ArticleTitle']
except:
article_title = ""
try:
journal_title = pubmed_article['MedlineCitation']['Article']['Journal']['Title']
except:
journal_title = ""
try:
mesh_terms = pubmed_article['MedlineCitation']['MeshHeadingList']['MeshHeading']['DescriptorName']
except:
mesh_terms = ""
This is how the data is written:
doctor_articles_filehandle.write("|".join([doctor_name, pmid, mesh_terms, journal_title, article_title, article_date, abstract]) + "\n")
I am importing:
from Bio import Entrez
import time
import codecs
Thank you for any help ahead of time! It'd be really great if I could find a solution to this!