Converting Illumina Raw Genotype Data Into Plink Ped Format
5
2
Entering edit mode
13.1 years ago
P.NJ ▴ 50

Hello,

I have the "FinalReport.txt" for Illumina raw genotype data generated from Genome Bead Studio for 2.5M (GSGT Version 1.8.4 ).

For my further analysis, I would like to convert this into PLINK format preferably.

Is there any way of doing this ? I would appreciate any suggestions.

Thank you.

conversion illumina genotyping plink • 23k views
ADD COMMENT
2
Entering edit mode
13.1 years ago
Wen.Huang ★ 1.2k

you may also consider the .lgen format, which is just taking the first few columns of the FinalReport. Plink has an option to read the .lgen format and convert it to PED file.

ADD COMMENT
2
Entering edit mode
13.1 years ago
P.NJ ▴ 50

Thank you for your suggestions. But I tried the -lgen format which did not work for me. I tried to create .map, .fam and .lgen file but when I try to run it,the resulting output for my .ped file contains

0 sample_name 0 0 2 1 0 0 0 0 0 ......

I have pasted some of the info from my file formats, maybe you could tell me if I am going wrong somewhere.

.map

chr# SNPName GeneticDistance bp units

24 GA008510 0 11771305

24 GA008524 0 19612089

.fam

Family_ID Individual_Name Paternal_ID Maternal_ID Sex Phenotype

0 sample_name 0 0 2 1

.lgen

Sample_Index Sample_ID SNP_Name Allele1Fwd Allele2Fwd

0 5528_C01 GA008510 C C

0 5528_C01 GA008524 T T

and then I try to run

plink --lfile test --recode

Any clue as to where I am going wrong ?

ADD COMMENT
0
Entering edit mode

in your .fam file, did you actually put "5528_C01" as your sample name or you just put "sample_name"? I just ran a test and it worked for me.

ADD REPLY
0
Entering edit mode

ahh, okay, I had not changed that... thank you very much. its worked for me as well.

ADD REPLY
0
Entering edit mode

I have the same issue and I try to get my ped file from lgen but I get the below error:

Error: Duplicate ID '100 100'.

I appreciate any help.

C:\Users\fadhl\Dropbox\plink_win64>plink --lfile Plate3_final_report --recode PLINK v1.90b3.41 64-bit (10 Sep 2016) https://www.cog-genomics.org/plink2 (C) 2005-2016 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --lfile Plate3_final_report --recode

16322 MB RAM detected; reserving 8161 MB for main workspace. Error: Duplicate ID '100 100'.

ADD REPLY
1
Entering edit mode
13.1 years ago
Biomed 5.0k

Genome Studio has an export module that creates Plink input files from your SNP data. http://www.illumina.com/Documents/products/technotes/technote_cnv_algorithms.pdf

ADD COMMENT
0
Entering edit mode

You can find the module here and it's super easy to export to plink format.

ADD REPLY
0
Entering edit mode
13.1 years ago
P.NJ ▴ 50

I would like to know if I am converting Illumina 2.5M array into plink, as the output should I not get approx 2-2.5M of SNPs ? I am getting approx 1M SNPs... does this happen ?

ADD COMMENT
0
Entering edit mode
8.1 years ago
forever ▴ 80

My final report header looks like: [Header] GSGT Version 1.9.4 Processing Date 8/1/2012 2:35 PM Content Cardio-Metabo_Chip_11395247_A.bpm Num SNPs 196725 Total SNPs 196725 Num Samples 60 Total Samples 60 [Data] SNP Name Sample ID Allele1 - Top Allele2 - Top GC Score SNP chr1:109457160 2 C C 0.8609 [T/G]

I do not have SNP_map file? How did you do it?

ADD COMMENT

Login before adding your answer.

Traffic: 2377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6