Entering edit mode
5.9 years ago
landscape95
▴
190
Hi everyone, I now have a large compressed genetic relationship matrix (.grm.gz) file around 870 GB. As I know, this file is the edge list of individuals via their relationship. Now I want to get this information of relationship between individual i and j but the compressed file is already large, can I have other ways to view it without extraction?
Your help is really appreciated!
You can use
zless input.grm.gz
to have a look into the file. But for extracting specific information other ways are neccessary. You need to tell us more about how your data looks like and what is your goal to get more detailed help.fin swimmer
Thank you! I want to do some samples clustering based on the relationship between individuals.
How did you produce the file? I have not yet come across a relationship matrix of that size, even with 1000 Genomes data.
If you are comfortable on the command line (i.e. BASH / Shell) using mathematical functions, then you could just manually compute the Euclidean Distances, which is the square root of the sum of all square differences, as I show here:
The Euclidean distance for Gene1 and Gene2 is (in R coding):
Check: