Entering edit mode
4.1 years ago
Kinoppy
•
0
Good morning, I'm working with a large data set. Is a table with cluster analysis data from an HTS sequencing. I have a problem with constant crushing of R studio, especially when I try to make plots and merging 2 data frame. Sometimes I have the crushing error (session aborted) also when I try to load the data in the console and I am basically unable to proceed with my analyzes.
There is a way to work with such data set using tidyverse and vegan package without having crush in R studio?
I'm using the latest version of R v.4.0.2 and R studio v.1.3.1093 in Ubuntu v.18.04.5
Thanks for the help.
Impossible to answer without further details. You are essentially telling that you have a problem but not what it is in terms of size of the data, code you are running etc.
Try your workflow on a toy sample of your data. For instance, if you could take a 20% of your data, would it work? Load the data, take a sub-sample, destroy the original object followed by gc() to clear memory. If it doesn't work, take a smaller sample. If it does work, examine how many resources were consumed (i.e. use top in your console to follow memory), and try taking a larger sample. See if there's something obvious happening, like maybe you're running out of memory with your full data set. Try to rule out the obvious things.
This is an example database of how my data is arranged:
In the original data that I have there are about 12000 column (tag, sampler, site, and all the otus) and 1000 rows. Is not a large dataset in terms of memory, but when i use the function pivot_longer() I obtain a data more than 2.5 Gb when i write it in a .csv file.
I use this function to put all the otus in a column because I want to merge this data with another data set that contains taxonomic units corresponding to my otus. Once I have the data in this forms I have the problems. If I try to make a plot, R takes a long time to process the code and finally (after at least 5 minutes) I get a crush error.
This is an example for one of the plots that gives me error:
Another crush that I get is trying to plot with rarecurve() function from vegan package.
For that I prepare the data in this way:
I get the same error as for ggplot.
Thank you for the help.
Use
ADD REPLY/ADD COMMENT
when responding to existing threads. This should be ideally added to the original post by editing it.SUBMIT ANSWER
is only for new questions to the original question.