Entering edit mode
3.1 years ago
anasjamshed
▴
140
I want to know how much ram memory is required to run a 15 GB data file and to clean datasets through python pandas.
I have a file of 14.8 GB which contains genes information. It contains a total of 45804630 rows and 39 columns
When tried to open it by taking 1000000 rows through pandas it was working fine
Code:
data = pd.read_csv("CosmicGenomeScreensMutantExport.tsv", sep= '\t',nrows=1000000)
But when I am trying to read all datasets at once, it hangs my pc. I only have 4 GB ram. So should I increase my RAM?
If you need to have a 15G file in memory then you obviously need more than 15G RAM, possibly a lot more depending on what precision you are reading your data (e.g. float16, float32, etc.). The pandas concat function could be of use to you, check e.g. this
I am trying to use this code:
from my friend pc who has 8 gb so will it work?
I don't understand why you would need to have the whole file in memory if you only need to "clean" it, whatever that means.. sounds like something that shouldn't require all the rows at the same time, so why not just process the file in chunks?
but i want to clean all datasets that's why i need to upload all at once
I'm just guessing here, but you probably need like minimum 64G RAM if you want to have your file in memory..
Ohh no. How can I acquire 64 GB