Hi, all. I'm trying to do basic population structure analysis with ADMIXTURE because it's faster than STRUCTURE, but I can't figure out how to get the populations to cluster together. More generally, I can't figure out for sure what order my outputs are in, within the P and Q files. There's a similar unanswered question from about a year ago.
The only references to the output format in the otherwise helpful manual are these: "ADMIXTURE's...output is simple space-delimited files containing the parameter estimates. ... "There is an output file for each parameter set: Q (the ancestry fractions), and P (the allele frequencies of the inferred ancestral populations)." ... "[If you use bootstrapping] The "se" file is in the same unadorned file format as the point estimates."
Well, it's unadorned all right! I can't tell from the Q file which individuals have which fractions, and therefore I can't see whether they're grouping into the expected populations.
A natural assumption would be that the output is in the same order as the input file, but I'm not sure this is the case. I reversed the order of my input file, and very little changed for my outputs.
The example HapMap data in the ADMIXTURE documentation does order and group predictably. I can reproduce the plot on page 6 using the commands on page 5. If I use plink to convert the .bed to .ped, move the Yoruba individuals to the top of the file, reconvert to .bed, rerun ADMIXTURE, and re-plot (whew), the YRI block moves to the front of the figure, as I would expect.
But my own data doesn't behave like that. Here's a sample of the .ped file:
# Stacks v1.41; PLINK v1.07; November 10, 2016
2 NKL3_001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 NKL3_018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 NKL3_028 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 NKL3_029 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 MTD_001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 MTD_007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 MTD_010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 MTD_029 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 ORC_001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 ORC_002 0 0 0 0 0 0 0 0 G G A A A A T T C C T T
6 ORC_019 0 0 0 0 0 0 0 0 G G A A A A T T C C T T
6 ORC_020 0 0 0 0 0 0 0 0 A A A A A A T T C C T T
From the first column and from the sample names in the second column, you can see that there are three underlying/assumed populations. (Each population has about 30 individuals, but I edited for brevity.)
However, this is my output, at K=2 and K=3: Absolutely no clustering whatsoever.
Now, at this point, you may be saying "Well, maybe your individuals just aren't grouped into populations." Aside from the fact that we know they are, I ran a test. I reversed the order of the input file, expecting to see a mirror image of the first plot. But the plot stayed exactly the same: In case you're thinking maybe I glitched and used the same data or the same plot twice (it's okay, I think those things about myself too), the K=3 plot does show a few differences -- but not a reversal difference. Does anyone know what's going on here?
I have the same question and would like to know the order of the individuals in the output. For the rest you can order you file with this:
tbl = read.table("~/Desktop/file_out_of_admixture.Q")
andord = tbl[order(tbl$V1,tbl$V2,tbl$V3),]
.