I have a set of wrapper scripts around a Pearson r correlation Rscript, which is called on N(N-1)
pairwise combinations of files, for each chromosome (or all chromosomes), with jobs scheduled on a Slurm-based cluster.
The main script is called pearson.py
:
https://resources.altius.org/~areynolds/public/cor/src/pearson.py
Here's the makefile
showing how it is used:
https://resources.altius.org/~areynolds/public/cor/src/makefile
The --by-chr
option does pairwise comparisons on a per-chromosome basis. This can be replaced with --all
to run a distance or similarity metric on whole-genome signal.
It goes without saying that the signal in pairs of your input files should be binned identically, so that correlation is run on vectors of equal length.
If you don't want Pearson r, you could edit line 117 of https://resources.altius.org/~areynolds/public/cor/src/chromCor2 to replace cor()
with another distance or similarity metric function call.
The call to pearson.py
relies on a metadata table file, containing label and path information in the second and fourth columns.
Here's an example of what that file looks like:
https://resources.altius.org/~areynolds/public/cor/data/set1.table.txt
The second column should contain a unique label for the file specified in the fourth column.
We use Starch files to save disk space, but you can specify gzipped files in the fourth column.
If you use gzip files, you must:
Modify lines 54-80 in pearson.py
to no longer test if the input files are genuine Starch archives.
Modify the call to chromCor2
in pearson_slurm.sh
, replacing -s
with the -z
option, in order to specify gzipped input, instead of Starch input.
Once you run things and have score results, a summary file called scores.mtx
is created in the ../results
directory.
For visualization, I have a script called mtx2json.py
(https://resources.altius.org/~areynolds/public/cor/src/mtx2json.py) that converts a scores.mtx
file to a scores.json
file.
You can call this by hand, e.g.:
$ ./mtx2json.py < /path/to/scores.mtx > /path/to/scores.json
This JSON file can be brought into my checkerboard tool: https://tools.altiusinstitute.org/checkerboard/
In the checkerboard tool, you paste in a JSON file of pairwise comparison scores, import it, select the desired categories ("labels" from the second column of the metadata table), and render the selection to an SVG-formatted figure.
You could also take the scores.mtx
file and use that directly with any number of custom visualization scripts. The mtx2json.py
step is for convenience.
Can you provide example data?