Phased Data In Plink
5
5
Entering edit mode
13.6 years ago
Pierre ▴ 500

Hello everybody,

I wonder whether it exists an easy way to conserve haplotypes when proceeding some basic actions with plink. Is there any options? I couldn't find on my own.

The situation is as following: we have phased data. We use the .ped and .map format. We want to apply some filters (e.g keep SNPs with Minor allele frequency above 5% in our set of individuals, keep a subset of individuals, etc.). But we figured out that plink do not keep the phased in this case. Everything is mixed up in the output files.

Thanks for the support! Regards Pierre

plink genotyping • 12k views
ADD COMMENT
7
Entering edit mode
13.4 years ago

When you recode your ped, plink puts the minor frequency allele as A1. plink does not guarantee that it would keep your phase but probably you can keep your alleles the same way you inserted (and keep phase) if you use --keep-allele-order?.

ADD COMMENT
3
Entering edit mode
13.6 years ago

This is one reason we do not use PLINK. We prefer to use HelixTree by Golden Helix, where such issues are not a problem.

ADD COMMENT
0
Entering edit mode

I will check it out!

ADD REPLY
3
Entering edit mode
6.1 years ago
Shicheng Guo ★ 9.6k

plink2 solved the problems you mentioned here: link

plink2 --vcf chr1.vcf --make-pgen --out chr1

The --pfile flag usually causes the binary fileset prefix.pgen + prefix.pvar + prefix.psam to be referenced, while --pgen/--pvar/--psam let you fully name one file at a time. New features supported by these formats include:

Reliable tracking of REF vs. ALT alleles. Computationally efficient compression of low-MAF and high-LD variants. Phased genotypes. Dosages. VCF-style header information (including species-specific chromosome info, so you don't have to constantly use --chr-set). Multiallelic variants. Multiple phenotypes. Named categorical phenotypes.

ADD COMMENT
2
Entering edit mode
13.6 years ago
Stephanie ▴ 20

I also couldn't find any way on PLINK's website to allow you to input a phased format.

Have you tried just getting the report of MAF (--freq), finding what SNPs fail the threshold you set, and then having PLINK remove specific SNPs (--exclude snplist.txt) that way? It isn't quite as elegant but might work.

ADD COMMENT
0
Entering edit mode

Thanks Stephanie. I could be an option BUT I think plink, in any case just mix up the phase. if you give the following simple command plink --file input --recode --out output (so nothing is done and there shouldn't be any differneces between input and output) the phases are lost anyway. So, ok there is no way using plink to keep phases. :-S

ADD REPLY
1
Entering edit mode
13.4 years ago
Pierre ▴ 500

Thnaks,

we found a way in the meantime that may be useful: you recode each diploid individual in 2 haploid ones

Ind1 Ind1 0 0 0 0 A T G C
Ind2 Ind2 0 0 0 0 G C A T

becomes

Ind1 Ind1_a 0 0 0 0 A A G G
Ind1 Ind1_b 0 0 0 0 T T C C 
Ind2 Ind2_a 0 0 0 0 G G A A
Ind2 Ind2_b 0 0 0 0 C C T T


but you then increase the size of your data by ~ 2-fold....

ADD COMMENT
0
Entering edit mode

Hi Pierre,

Basically, I'm new to bioinformatics, and PLINK (obviously). Sorry for asking quite a silly question... the PED files I'm given to be used for analysis are also in the format you mentioned (since they are phase data). Will this interfere with downstream analysis having two haploid ones from the same individual? I don't know if this question makes sense...

Thanks...

ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6