I ran MDS plots on a data set of >620,000 SNPs and 934 individuals using plink.
I followed the advice of a former lab member as well as this tutorial (https://www.staff.ncl.ac.uk/heather.cordell/pak2010MDS.html - mainly just the stuff after "To perform MDS analysis in PLINK, we first calculate a file").
plink --file dataset --genome --out dataset
plink --file dataset --read-genome dataset --mds-plot $k --out "dataset_"$k
($k
is looped in and is the number of MDS dimensions ranging from 1 to 20).
The first line (genome file creation) does take a while, but I have noticed that the actual mds generation takes about 10 minutes per run and uses about half a gigabyte of RAM, which seems suspiciously quick and non-intensive.
Anyone have any experience with this? (i.e., am I doing something wrong, or is this normal)
Actually, once your data has been converted to binary, genome file creation should take less than two minutes with your dataset with PLINK 1.9 on most systems, and --mds-plot should take less than one.
This was running on Plink 1.07, single threaded. HPC just got Plink 1.9 installed this morning, so I haven't tried it out yet.