This is possible but you are missing a lot of information, namely:
- Family ID (FID)
- Paternal ID (PID)
- Maternal ID (MID)
- gender/sex
You are also missing a map file. See the map file format here.
You can create a temporary (and incomplete) map with the following code, which is specific for your dataset:
head -1 plink.raw | sed 's/ \+/\n/g' | sed '1,2d' | awk '{print "0\t"$0"\t0\t0"}' > plink.map
cat plink.map
0 rs3117294 0 0
0 rs2747453 0 0
0 rs2747454 0 0
0 rs2747457 0 0
0 rs3131888 0 0
Next, you have to edit your main data to get it into a pseudo-PED format. Read about PED files here, and their input here.
sed '1d' plink.raw | sed 's/_/ /g' > plinkv2.raw
cat plinkv2.raw
D0024949 0 C C A G A G A A A G
D0024302 0 A C A A A G A A A A
D0023151 0 C C A G A A A A G G
D0022042 0 A C A A G G A A A A
D0021275 0 C C A G A G A A A G
D0021163 0 A A A A G G A A A A
D0020795 0 A A A A G G A C A G
D0020691 0 A A A A G G A C A G
D0019121 0 A A A A G G C C G G
Then you can input your data, but you have to specify that you're missing FID, PID, MID, and gender/sex.
/Programs/plink1.90/plink --file --ped plinkv2.raw --map plink.map --no-fid --no-sex --no-parents
PLINK v1.90b3.38 64-bit (7 Jun 2016) https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink.log.
Options in effect:
--file
--map plink.map
--no-fid
--no-parents
--no-sex
--ped plinkv2.raw
15037 MB RAM detected; reserving 7518 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (5 variants, 9 people).
--file: plink.bed + plink.bim + plink.fam written.
Thank you for your help but when I apply the following code to my data it gives me empty file
head -1 plink.raw | sed 's/ +/\n/g' | sed '1,2d' | awk '{print "0\t"$0"\t0\t0"}' > plink.map
Are you using MAC? I use linux (Ubuntu).
All you need for the map is a list of your SNP IDs, surround by 1 column of zeros on the left, and 2 columns of zeros on the right.
What is the aim of your analysis, by the way? - are you just doing association testing?
yes I'm trying to do association
yes I use MAC,anyway I solved the .map issue
after i solved the issue of .map, I got the following error
A problem with line 1 in [ plinkv2.raw ] Expecting 2 + 2 * 199 = 400 columns, but found 392
what should I do here?
Looks like your 'PED' file is incomplete. PLINK found 199 variants in your map, and therefore expected 2 * 199 genotypes in the PED file, plus 2 extra columns for sample ID and phenotype.
Just look over the files to ensure that there are no formatting issues. If you only have 199 SNPs, this should not take long
I think the ped file is not complete as you said I don't know why? I followed your coding but still having this issue
My code is only based on the small sample that you provided, which may not be applicable to the entire dataset.
One more thing that you could try is opening your file with the
vi
editor and checking to see if there is a ^M at the end of each line. In that case, use thedos2unix
command to get rid of these, and then retry the code.