Hi,
I am trying to extract PubMed records via Bio Python library based one some gene names (e.g. all pmids which contains these gene names in their Abstracts). I wrote the following code and it is returning some results, but I am not sure that It is working correctly. I am wondering whether this code is going to miss some articles that contain similar gene Symbols (e.g. P53 for TP53) or Synonyms of them or not. And also, can I trust to PubMed filtering with this approach or I should get all of the abstracts and manually search/filter them.
handle = Entrez.esearch(db="pubmed", term="TP53[gene] AND BRCA1[gene] AND CXCL12[gene] ")
record = Entrez.read(handle)
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") # See medline format table
records = Medline.parse(handle)
records = list(records)
for record in records:
print("title:", record.get("TI", "?"))
print("authors:", record.get("AU", "?"))
print("source:", record.get("SO", "?"))
print("Abstract", record.get("AB","?")) #Abstracts
print("")
I am going to make some general comments.
You will want to use
OR
instead ofAND
in your terms since I don't get any hits with all three genes in the example above withAND
when using NCBI eUtils. A ton of hits appear, if the terms are used individually or combined withOR
. What is your ultimate aim in doing this since there must be a lot of records in pubmed with these terms.My search was done using: