You can use this Python script that I wrote just now (only tested on Python 2.7).
import sys
import argparse
from Bio import Entrez
parser = argparse.ArgumentParser(description='Searches for protein sequences in the Title Word field ([TITL]) based on any provided key terms.\nSee here for further details: http://cbsu.tc.cornell.edu/resources/seq_comp/pb607_introductory/entrez/ncbi_entrez.html')
parser.add_argument('-e', action='store', dest='EmailAddress', required=True, help='Entrez requires your email address.')
parser.add_argument('-t', action='store', dest='SearchTerm', required=True, help='Requires a search term (wrap in double quotes).')
arguments = parser.parse_args()
Entrez.email = arguments.EmailAddress
SearchTerm = arguments.SearchTerm
#LookupCommand = "refseq[FILTER] AND txid9606[Organism] AND " + SearchTerm + "[TITL]"
LookupCommand = "refseq[FILTER] AND " + SearchTerm + "[TITL]"
handle = Entrez.esearch(db='protein', term=LookupCommand)
results = Entrez.read(handle)
handle.close()
#Lookup the FASTA sequence for each protein by its GeneInfo Identifier (GI) number
for gi in results['IdList']:
handle = Entrez.efetch(db='protein', id=gi, rettype='fasta')
print handle.read()
handle.close()
Execute it with
python ProteinFASTASearchByFASTATitle.py -e Me@MyEmail.com -t "RNA polymerase subunit" > protein.fa
python ProteinFASTASearchByFASTATitle.py -e Me@MyEmail.com -t "ribosomal protein" >> protein.fa
sed -i '/^$/d' protein.fa
head protein.fa
>WP_098657443.1 RNA polymerase subunit sigma-70 [Bacillus toyonensis]
MNQSYSSLNRDESLTRTINLGTTARSIGPLVKPEDENFEVKEIWNYKVLSKQESLNLFRRYKHGEKDLRE
YLFHVNIGLVLSIARKYKKKHPEIEFDDLVQEGNEGMLRAIEDFDPDLGYCFSTYAYCWIKKSMLGFICK
KKSGPFKIPNYVNQFNVKYVEIEDKYLQMHNRIPTVEEVVKELDVTREKVVRHNVYYNWVTTMTLDIDTI
NEDIGILNSFCNDNSAIPSTNEMIMEDLNYEIWIIFDEVLNPKQKMVLNLCFGLLDGEIHLHKEIAKALM
ITTERVSQLKDEAINRLKKCDYKDEIFNLLHAKLKVMDELNMA
>WP_098657164.1 RNA polymerase subunit sigma-70 [Bacillus toyonensis]
MKPATFTETVVLYEGMIVNQIKKLSIYQDHEEYYQCGLIGLWYAYERYEEGKGSFPAYAVITVRGYILER
LKKECIMQERYVCVGEYDEQFESEETGMRAQDFMSVLNKRERHIISERFFVGKKMGEIACEMGMTYYQVR
WIYRQALEKMRDSVKG
The final sed command just deletes empty lines, which the entrez fetch command produces.
This script searches for your term in the [TITL] field in Entrez, which will contain the product name (see here: http://cbsu.tc.cornell.edu/resources/seq_comp/pb607_introductory/entrez/ncbi_entrez.html ). If you want just human sequences, then un-comment the #LookupCommand = "refseq[FILTER] AND txid9606[Organism] AND " + SearchTerm + "[TITL]"
line in the script, and comment out the other line beneath it.
Kevin