Hello,
I have a dataset having more than 1 million rows and 11 columns and I tried using consolidate average function to merge the duplicate rows by averaging their respective values. However, since my dataset is very large, excel is not giving me the result.
Can anyone guide how I can perform consolidate function with 1 million rows in excel? is there any command I can use in excel similar to consolidate function?
Thank you.
Excel is not really suitable for crunching larger datasets or bioinformatics in general (is it bioinformatics?). Exporting your dataset into a CSV file and doing the calculation in R is my best bet. How do you specify duplicated rows if the values are different, possibly based on an identifier?
yeah it is bioinformatics. and in R if I have to do it then what code should I use? my identifier is gene symbols and I need to merge duplicates by averaging the sample values of respective gene symbols.
Generally speaking, you should invest the time to learn R or Python properly if you plan to perform these kinds of tasks more frequently. But if it is a one of thing, we might be able to help you if you can post a few lines from your file and explain what exactly you are trying to achieve (awk might work as well).
You can also give Galaxy a spin, which provides a graphical user interface, but still gives you access to powerful tools in the background. Depending on the complexity of your tasks, the grouping tools might already work for you.