Formatting Beadstudio Final Report Into Plink
1
3
Entering edit mode
12.2 years ago
romsen ▴ 70

Hello again

I ´have to convert Illumina HumanHap chip data into PLINK (PED file). I'll proceed as described here. But my generated ped file shows only 0 for each genotype. Plink is warning during the process:

[...] 50 males, 50 females, and 0 of unspecified sex

Before frequency and genotyping pruning, there are 1000000 SNPs

100 founders and 0 non-founders found

1000000 SNPs with no founder genotypes observed

Warning, MAF set to 0 for these SNPs (see --nonfounders)

Writing list of these SNPs to [ plink.nof ]

Total genotyping rate in remaining individuals is 0 [...]

fam-file:

    1    192    0    0    1    0
    2    193    0    0    2    0
    3    213    0    0    1    0
    4    214    0    0    1    0

map-file:

1    rs3934834    0    995669
1    rs3737728    0    1011278
1    rs6687776    0    1020428
1    rs9651273    0    1021403

lgen-file:

[Header]                
BSGT Version    3.0.27            
Processing Date        
Content        
Num SNPs    1000000            
Total SNPs    1000000            
Num Samples    100        
Total Samples    100            
[Data]                
Sample Index    Sample Name    SNP Name    Allele1     Allele2
1    192    rs10000010    A    G
2    193    rs10000010    A    G
3    213    rs10000010    A    G

My lgen file has a 10 row header then the data-rows are following. The information about the genotype is given by the forward alleles exportet via beadstudio (With Top Alleles the same sobering result)

After running plink to reconstruct ped file I get this ped file with missing genotypes:

1 192 0 0 1 -9 0 0 0 0 0 0 0 0 [...]
2 193 0 0 2 -9 0 0 0 0 0 0 0 0 [...]
3 213 0 0 1 -9 0 0 0 0 0 0 0 0 [...]

Perhaps one of you, find the mistake or have an idea to solve the problem. Do I need a reference file or is the title in the lgen-file the problem? Thank you very much.

illumina plink • 9.7k views
ADD COMMENT
0
Entering edit mode

Perfect. Thank you. PLINK starts to work now but there is a new error. In my file there are too many Allels.

ERROR: Locus rs10000023 has >2 alleles:
       individual 12 070 has genotype [ - - ]
       but we've already seen [ T ] and [ G ]
ADD REPLY
0
Entering edit mode

I've edited my answer to address the issue.

ADD REPLY
0
Entering edit mode

nice. I had the same idea. I use windows therefore do you have a script for plink or perl?

ADD REPLY
0
Entering edit mode

If you can manage to open the file in a text editor and perform "find and replace" on all "-" to "0", I think that should work, otherwise, if you are going to do much bioinformatics work in Windows I would suggest installing and becoming familiar with Cygwin.

ADD REPLY
0
Entering edit mode

Unfortunatly it's to big. I can't open it in notepad.

ADD REPLY
0
Entering edit mode

As a slightly less intimidating alternative to installing Cygwin for sed functionality, you can probably use this blog post about Powershell.

ADD REPLY
0
Entering edit mode

Hehe, thanks I check this. Now I get it with perl.

perl -p -i.bak -e "~s|-|0|" file.lgen
ADD REPLY
0
Entering edit mode

Hi. I'm facing a similar issue to the above. I have made a .ped file from a beadstudio report but my missing values are specified as "-" rather than 0. The file is too big to find and replace using Nano and the above perl command replaces only the first occurence (in this case changing the phenotype specification "-9" to "09"). I'm not familiar with perl or command line operations and wondered if anyone could help?

ADD REPLY
0
Entering edit mode

I've managed to get around the phenotype issue by using:

perl -p -i.bak -e "~s|- |0 |" file.lgen

But this is still only dealing with the first occurences in the ped file.

ADD REPLY
0
Entering edit mode
perl -p -i.bak -e "~s|- |0 |g" file.lgen

Fixes this for anyone encountering a similar problem.

ADD REPLY
2
Entering edit mode
12.2 years ago

I would try removing the header from your lgen file. The PLINK documentation gives an example without the header. If you are using Linux, try:

egrep '^[0-9]+' lgen-file > lgen-file.noheader
sed 's/-.+-$/0 0/g' lgen-file.noheader > lgen-file.noheader.missingalleles

This will remove the header, and then should replace all occurrences of "- -", which seems to be Illumina's notation for missing alleles, with "0 0", which seems to be PLINK's notation for missing alleles.

ADD COMMENT

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6