What is the relationship between PLink ped files and tped files
1
3
Entering edit mode
9.0 years ago
haohanw ▴ 90

I wonder what is the relationship between Plink .tped and .ped files. From what I observe, it seems it is more complicated than a simple transpose.

For example, in Section 4.1.1 of this manual, there is an example as following:

1     1     0     0     1     1     1     1     G     G
1     2     0     0     2     1     0     0     A     G
1     3     0     0     1     1     1     1     A     G
1     4     0     0     2     1     2     1     A     A

is transposed as

1     snp1     0     10001     1     1     0     0     1     1     2     1
1     snp2     0     20001     G     G     G     A     G     A     A     A
#                                          ^     ^     ^     ^

but instead of, what I thought should be:

1     snp1     0     10001     1     1     0     0     1     1     2     1
1     snp2     0     20001     G     G     A     G     A     G     A     A
#                                          ^     ^     ^     ^

Why there is a reverse relationship here?

And I think this reverse is not guaranteed to happen, for the reasons that in example of Section 3.4 of the same manual, it's hard to tell if there is any pattern for whether should be reversed or not.

(I am quite new to this area, and I hope the reason is not something very superficial as common sense in this domain)

plink SNP GWAS • 4.3k views
ADD COMMENT
4
Entering edit mode
9.0 years ago

Interesting, I didn't know about that! Could it be that PLINK internally just sorts the alleles using some arbitrary rules?

I just ran a test with input alleles "G A", "A G" in various combinations with other SNPs and they always came out as "G A" in the transposed dataset.

Similarly, "G T", "T G" always becomes "G T", "G C", "C G" always becomes "G C" etc. "A T"/"T A" is always "A T", "A C"/"C A" becomes "A C", "G C"/"C G" becomes "G C". It can't be alphabetically sorted for obvious reasons.

The funny thing is, if I repeat the same thing using PLINK2, I get alphabetically sorted alleles: your example becomes G G A G A G A A (and my test-cases become alphabetically sorted, too). That makes me think that it's rather arbitrary and doesn't particularly matter.

Edit: I think it has to do with the way PLINK 1.07 stores genotypes as numbers - if you run

plink --file mytest --recode --transpose

you get the above inconsistent behaviour, but if you run

plink --file mytest --recode12 --transpose

so that all genotypes become numerically recoded, you'll always see "1 2" for all test cases, so these genotypes seem to be not alphabetically, but numerically sorted!

ADD COMMENT

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6