Entering edit mode
8.6 years ago
PAn
▴
20
Hello everyone,
I have a regular PLINK ped file and I need to convert it to SNP by Sample format, I did that using .tped format.
1 snp1 0 5000650 A A A C C C A C C C C C
1 snp2 0 5000830 G T G T G G T T G T T T
Is there anyway I can convert it into - 1) compound genotype format
1 snp1 0 5000650 AA AC CC AC CC CC
1 snp2 0 5000830 GT GT GG TT GT TT
2) and convert it to single coded files -
1 snp1 0 5000650 0 1 2 1 2 2
1 snp2 0 5000830 1 1 2 0 1 2
Thanks a lot!
Use the --recodeA command to your line.
http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml
Thanks a lot that works! I tried the recodeAD and dint quite understand the results I guess. Edit - I am getting the .raw file in
Sample vs SNP format. Is there anyway to get it in
SNP vs Sample
format? Thanks! I can try to transpose in bash but the file is too big and it would be nice to know if there is any in built functionHow many samples and snps you have?
Its around 1600 samples and 23 million SNPs. I am trying awk to transpose, but the file is huge so its taking forever!
First this: sed '1q;d' FILE.txt > 1.txt sed '2q;d' FILE.txt > 2.txt sed '3q;d' FILE.txt > 3.txt etc
Then this: tr ' ' '\n' < 1.txt > 1_t.txt tr ' ' '\n' < 2.txt > 2_t.txt tr ' ' '\n' < 3.txt > 3_t.txt etc
or oneliner: sed '1q;d' FILE.txt | tr ' ' '\n' > 1.txt etc
and then finally:
paste 1_t.txt 2_t.txt 3_t.txt etc > final_file