Hello,
I want to extract the human-HCV(Hepatitis C virus) protein-protein interactions (PPI). For doing this, I have downloaded the entire content of the IntAct database as a .txt file. This .txt file has a huge size (4GB). I tried to convert this text file to a CSV file by Python and then extract just human-HCV PPIs. The problem is the size of the file, and I encounter a memory error.
input:
import pandas as pd
read_file = pd.read_csv('intact.txt', delimiter='\t')
read_file.to_csv('intact.csv', index=None)`
output: `MemoryError: Unable to allocate 162. MiB for an array with shape (41, 1035669) and data type object`
how should I solve this issue?
I sincerely would appreciate your help.
No, I didn't try. Sorry, I'm not an expert in python.
Should I put your mentioned part of code before read_file = pd.read_csv('intact.txt', delimiter='\t')
Memory errors happens a lot with python when using the 32bit Windows version . This is because 32bit processes only gets 2GB of memory to play with by default.
The solution for this error is that pandas.read_csv() function takes an option called dtype. This lets pandas know what types exist inside your csv data.
For example: by specifying dtype={'age':int} as an option to the .read_csv() will let pandas know that age should be interpreted as a number. This saves you lots of memory.
Did you try zero initialization?
did you check the usual suspects on stackoverflow, for example https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type
No, I didn't try. Sorry, I'm not an expert in python. Should I put your mentioned part of code before
read_file = pd.read_csv('intact.txt', delimiter='\t')