Make A Special File By Plink
2
0
Entering edit mode
11.2 years ago
mary ▴ 210

Dear All

I working on SNP genotyped data of Bovine 50K beadchip, as all know the ped file format is as below:

FAM001 1 0 0 1 2 A A G G A C

I want to have this file:

FAM001 1 0 0 1 2 A G A
FAM001 1 0 0 1 2 A G C

I mean I want to have SNP genotype in one column. Is there any command in plink for making this file? I will appreciate if some one help me

plink ped • 4.8k views
ADD COMMENT
0
Entering edit mode

It would help if you could tell us why do you need it in this format.

ADD REPLY
1
Entering edit mode

Hi

I want to make a input file of sweep v1.1 software. Sweep accepts a standard format of genotype data, fully phased with missing data filled in. it should be two file 1. Genotype data file and 2. SNP data file. Genotype data file contain

Column 1: the individual identifier. - Column 2: the chromosome identifier. For autosomes you should have two chromosomes per individual. We can label the two chromosomes T for transmitted and U for untransmitted, (but it can be anything eg. A and B.) - Columns 3 – N: each column gives the allele for one SNP in the order of its position on the chromosome. The alleles are represented as A=1, C=2, G=3, T=4. same as below

1331-1331FF12 T 1 3 3 2

1331-1331FF12 U 1 1 1 2

1331-1331FM13 T 1 3 3 2

1331-1331FM13 U 1 3 3 4

and The SNP data file has 3 tab-delimited columns, which gives information about the markers you genotyped. and file contain :

Column 1: The SNP identifier. This can be an rs number or any other name you choose to give. - Column 2: The chromosome. - Column 3: The SNP position based on the build identified. UMD3.1 are currently recognized as below:

snpid chr HG16

rs267265 3 45548733

rs267262 3 45567119

rs267241 3 45578901

thanks for your attention

ADD REPLY
1
Entering edit mode

Thanks for explaining what you need the data for, but I would caution you that your data are most likely not fully or even partially phased. Array data are not phased by haplotype, and if you require phasing you need to first apply a method that attempts to phase your genotypes by haplotype.

ADD REPLY
0
Entering edit mode

Hi Matt thanks for your guide, actually I am new in haplotype phasing, I know I can use fastPHASE or PHASE for haplotype phasing (Stephens, Smith et al. 2001). I did it in Linux but i don't have enough memory for haplotype reconstruction of whole chromosome. and also when I reconstruct partial segment it didn't give me sweep input format, so I think, may be i can use plink. I will be appreciate if you help me for haplotype reconstruction.

ADD REPLY
0
Entering edit mode

I don't think I can help you much with the actual work, but just wanted to make sure you weren't expecting that your genotypes were already phased.

ADD REPLY
0
Entering edit mode

Have a look at SHAPEIT, it is "multi-threaded to tailor computational times to your resources."

ADD REPLY
0
Entering edit mode
11.2 years ago

You might want to try plink --file data.ped --recodeAB --out dataAB, which will recode AGCT as A|B depending on the major/minor allele. Then you can do sed -e 's/A A/0/g' -e 's/A B/1/g' -e 's/B B/2/g' dataAB.ped > data012.ped. This last command just collapses A|A > 0, A|B > 1, B|B > 2.

ADD COMMENT
1
Entering edit mode

Hi Matt thanks for your reply, actually I want have ped file that recod1234 and each sample repeated in two line and each column gives the allele for one SNP in the order of its position on the chromosome. The alleles are represented as A=1, C=2, G=3, T=4.

1331-1331FF12 1 0 0 2 1 3 3 2

1331-1331FF12 1 0 0 2 1 1 1 2

The first row therefore represents one chromosome for individual 1331-1331FF12 with the haplotype AGGC. The second row represents the other chromosome for individual 1331-1331FF12 with the haplotype AAAC.

ADD REPLY
0
Entering edit mode

I see now. I'm not sure there's a simple plink command for this. The PED file does not contain phased haplotypes at all, so you'll have to impute haplotypes somehow first: http://pngu.mgh.harvard.edu/~purcell/plink/haplo.shtml

ADD REPLY
0
Entering edit mode
11.2 years ago
zx8754 12k

If genotypes needed in 1234 format then use plink --recode1234.

Then, here is a quick R code:

#dummy data
x <- read.table(text="
FAM001 1 0 0 1 2 A A G G A C
FAM002 1 0 0 1 2 A T G C C C
FAM003 1 0 0 1 2 1 1 3 3 1 2
FAM004 1 0 0 1 2 1 4 3 2 2 2
                ")
#subset
x1 <- cbind(x[,c(1:6)],"T",x[,seq(7,ncol(x),2)])
x2 <- cbind(x[,c(1:6)],"U",x[,seq(8,ncol(x),2)])

#make same colnames for "rbind"
colnames(x2) <- colnames(x1)

#join
x3 <- rbind(x1,x2)

#sort by "FamID" and "TU"
result <- x3[with(x3, order(x3[,1],x3[,7])), ]

#output
result

V1 V2 V3 V4 V5 V6 "T" V7 V9 V11
1 FAM001  1  0  0  1  2   T  A  G   A
5 FAM001  1  0  0  1  2   U  A  G   C
2 FAM002  1  0  0  1  2   T  A  G   C
6 FAM002  1  0  0  1  2   U  T  C   C
3 FAM003  1  0  0  1  2   T  1  3   1
7 FAM003  1  0  0  1  2   U  1  3   2
4 FAM004  1  0  0  1  2   T  1  3   2
8 FAM004  1  0  0  1  2   U  4  2   2
ADD COMMENT

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6