Hello! I am using the REST service of BioGRID for fetching the PPI in Homo sapiens, following the instructions from the BioGRID repository and as template this script https://github.com/BioGRID/BIOGRID-REST-EXAMPLES/blob/master/get_interactions_for_pandas.py However, when I check the shape of the DataFrame, it always show me a maximum of 10K interactions, I understand that this is due to the restriction of the REST service, where the number maximum of interactions is 10K. I would like to know is someone had faced the same problem before and how to solve it?
Here is my script:
``
# Define the URL for fetching protein interactions from BioGRID
BIOGRID_URL = "https://webservice.thebiogrid.org/interactions/?"
#BioGRID API key
API_KEY = "myKEY"
# Species ID
SPECIES_ID = "9606" # Homo sapiens
def fetch_protein_interactions():
params = {
"taxId": SPECIES_ID,
"accesskey": API_KEY,
"format": "json",
# "interSpeciesExcluded":"true",
# "selfInteractionsExcluded":"true",
# "includeEvidence":"true",
# "throughputTag":"true"
}
try:
response = requests.get(BIOGRID_URL, params=params)
response.raise_for_status()
interactions_data = response.json()
return interactions_data
except requests.exceptions.RequestException as e:
print("Error fetching data:", e)
return None
def transform_to_dataframe(interactions_data):
# Extract relevant data fields
interactions = []
for interaction_id, interaction_info in interactions_data.items():
interaction_entry = {
"BioGrid ID_A":interaction_info["BIOGRID_ID_A"],
"BioGrid ID_B":interaction_info["BIOGRID_ID_B"],
"Organism A":interaction_info["ORGANISM_A"],
"Organism B":interaction_info["ORGANISM_B"],
"SymbInter_A": interaction_info["OFFICIAL_SYMBOL_A"],
"SymbInter_B": interaction_info["OFFICIAL_SYMBOL_B"],
"Gen A":interaction_info["ENTREZ_GENE_A"],
"Gen B":interaction_info["ENTREZ_GENE_B"],
"Experimental System": interaction_info["EXPERIMENTAL_SYSTEM"],
"Experimental System Type": interaction_info["EXPERIMENTAL_SYSTEM_TYPE"],
"Throughput": interaction_info["THROUGHPUT"],
"Quantitation": interaction_info["QUANTITATION"],
"Qualification": interaction_info["QUALIFICATIONS"],
"Pubmed Author": interaction_info["PUBMED_AUTHOR"]
}
interactions.append(interaction_entry)
# Convert to Pandas DataFrame
interactions_df = pd.DataFrame(interactions)
return interactions_df
if __name__ == "__main__":
interactions_data = fetch_protein_interactions()
# print(interactions_data) # Debug: print interactions_data
if interactions_data:
interactions_df = transform_to_dataframe(interactions_data)
print("DataFrame Size:")
print(interactions_df.shape) # shape of df
print("Number of Interactions:")
print(len(interactions_df)) # number of interactions
# print(interactions_df.head(10)) # dataframe head
how about 'just' using the XML dump ? https://wiki.thebiogrid.org/doku.php/psi-mi_xml_version_2.5
Thank you for your answer. It is a good option, however, dealing with such files for further filtering it is a more complicated task in my opinion, or at least I have not be able to do it, that's why I was looking for the REST service option.