If your .csv file contains data reqired for .ped and .map formats you can use it directly. For the .ped mandatory columns are: Family ID, Individual ID, Paternal ID, Maternal ID, Sex (1=male; 2=female; other=unknown), Phenotype. You need these data to run Plink. Then instead of a command:
To make you understand better, I am pasting a few columns of my ped and map file here below.
Here is how my ped (.csv) file looks like. The respective columns are: IID,FID,PID, MID, Sex, Phenotype, SNP
1 1 0 0 2 1 TC TT AA AG CA CA AG GG GG GG CC GA TC TC GG
2 2 0 0 2 1 TC TT AA AA CA CA AA CG GA GT TT GA TC CC GG
3 3 0 0 2 1 TC CC AA AA CC AA AG CC AA GT TT AA CC CC GG
4 4 0 0 2 1 TC TT AA AA AA CC AG CG GA GT CT GA TC CC GG
5 5 0 0 2 1 TC TT AA AA CA CA GG CG AA TT CT AA CC CC GA
And my Map(.csv) looks like this. The respective columns are Chromosome, SNPid, Genetic position, Physical position
I also tried converting my CSV into TSV and got an error :
Error: Invalid chromosome code '17press' on line 1 of .map file.
(Use --allow-extra-chr to force it to be accepted.)
Then, I used --allow-extra-chr and I got another error :
Error: Invalid bp coordinate on line 1 of .map file.
Then I manually checked the coordinates of the 1st variant (rs1049620)
on google and found that it was actually wrong. For the knowledge, this SNP has no mention in the dbSNP which is the largest hub of genetic variants and hence was fetched wrongly from some other database I think. I wonder how such an error could incur since I fetched all those chromosomal locations using Ensembl Biomart. To further confirm, I checked other bp coordinates also but they were all correct.
I again ran the above command after correcting. But it shows the same error :
Error: Invalid bp coordinate on line 1 of .map file.
I have spent all my day around this and I still couldn't find the problem. :(
It would be great if someone could help me with it or suggest me some alternative way of converting CSV/TSV into MAP format!!!
You should not be getting the same error - take a look:
cat test.ped
1 1 0 0 2 1 T C T T A A A G
2 2 0 0 2 1 T C T T A A A A
3 3 0 0 2 1 T C C C A A A A
4 4 0 0 2 1 T C T T A A A A
5 5 0 0 2 1 T C T T A A A A
cat test.map
17 rs1049620 0 49404152
6 rs1143684 0 3010156
13 rs11571836 0 32399302
8 rs14448 0 89933605
plink --ped test.ped --map test.map
PLINK v1.90b3.38 64-bit (7 Jun 2016) https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink.log.
Options in effect:
--map test.map
--ped test.ped
15037 MB RAM detected; reserving 7518 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (4 variants, 5 people).
--file: plink.bed + plink.bim + plink.fam written.
Please check again the formatting of your data. Anything like even an extra space can cause an issue
Yeah I had seen that post earlier and I posted my query after a lot of googling. So, the problem is when I tried that I got an error saying:
Error: Line 1 of .map file has fewer tokens than expected
To make you understand better, I am pasting a few columns of my ped and map file here below. Here is how my ped (.csv) file looks like. The respective columns are: IID,FID,PID, MID, Sex, Phenotype, SNP
And my Map(.csv) looks like this. The respective columns are Chromosome, SNPid, Genetic position, Physical position
Try spaces between the MAP columns. Also, be sure that there are no hidden carriage returns like
^M
- trydos2unix
Hi Kevin, I did not understand what you mean by "no hidden carriage returns like ^M - try dos2unix" ?
FYI, all my files have been created on Linux .
If you open your file in
vi
, do you see any unusual characters at the line ends?No Kevin. It does not have any unicode or unusual characters.
I also tried converting my CSV into TSV and got an error : Error: Invalid chromosome code '17press' on line 1 of .map file. (Use --allow-extra-chr to force it to be accepted.)
Then, I used --allow-extra-chr and I got another error : Error: Invalid bp coordinate on line 1 of .map file.
Then I manually checked the coordinates of the 1st variant (rs1049620) on google and found that it was actually wrong. For the knowledge, this SNP has no mention in the dbSNP which is the largest hub of genetic variants and hence was fetched wrongly from some other database I think. I wonder how such an error could incur since I fetched all those chromosomal locations using Ensembl Biomart. To further confirm, I checked other bp coordinates also but they were all correct.
I again ran the above command after correcting. But it shows the same error : Error: Invalid bp coordinate on line 1 of .map file.
I have spent all my day around this and I still couldn't find the problem. :( It would be great if someone could help me with it or suggest me some alternative way of converting CSV/TSV into MAP format!!!
Perhaps first try it with a minimal reproducible example of just a few variants
.
Thank you Kevin for the valuable response but I am still getting the same error.
You should not be getting the same error - take a look:
Please check again the formatting of your data. Anything like even an extra space can cause an issue
The problem is fixed. Thank you Kevin! :)