Converting between Impute2 and Ped/Map after imputation - 1st column dashes (--) problem
4
4
Entering edit mode
12.8 years ago

Dear All,

have taken a PED/MAP format PLINK file and converted it into a .gen/.sample file with gtool. This has given me this look:

pkd@bioinform:~/strand_correct_script/Files_during_updating$ head controls.gen | cut -d " " -f 1-20

5 chr5:96000607 96000607 A G 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0
5 rs1421911 96000947 C T 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0
5 rs6860934 96001842 C T 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1

I understand from the IMPUTE website the first column should be SNP1,SNP2,SNP3, but I pushed on thinking maybe things would sort themselves out. I imputed with IMPUTE2 against 1000 genomes and then this produced this format of .gen file:

pkd@bioinform:~/Impute2/converting_back_to_plink$ head European_imputed_controls.gen | cut -d " " -f 1-20

--- 5-96000097 96000097 A G 1 0 0 1 0 0 0.976 0.024 0 1 0 0 1 0 0
--- 5-96000203 96000203 C T 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
--- 5-96000264 96000264 C T 1 0 0 1 0 0 0.998 0.002 0 1 0 0 1 0 0
--- rs7733671 96000269 G A 0 1 0 0 1 0 0 0.947 0.052 1 0 0 0 0 1
--- 5-96000338 96000338 C A 1 0 0 1 0 0 0.997 0.003 0 1 0 0 1 0 0
--- rs73774358 96000463 A G 1 0 0 1 0 0 0.985 0.015 0 1 0 0 1 0 0
--- 5-96000525 96000525 G A 1 0 0 1 0 0 0.997 0.003 0 1 0 0 1 0 0
5 chr5:96000607 96000607 A G 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0
--- rs73774359 96000658 A C 1 0 0 1 0 0 0.985 0.015 0 1 0 0 1 0 0

When I tried to convert it back to PLINK PED/MAP Plink said that all the SNPs were named the same "---" and crashed in flames. I have read the gtool site and cannot see any reference to what it puts in the first column when it converts from PLINK to GEN/SAMPLE, or what IMPUTE2 should do when it imputed new snps. I can load the file into R and put an arbitrary first column in, but I was wondering whether this is necessary or have I made an error somewhere.

Thank you in advance.

Philip

plink imputation • 17k views
ADD COMMENT
6
Entering edit mode
10.7 years ago

PLINK 1.9 has --recode oxford for direct export, and --data/--gen/--bgen/--sample for import. --hard-call-threshold can be used to set a genotype likelihood cutoff, or randomize genotypes based on the likelihoods, during import.

ADD COMMENT
4
Entering edit mode
12.8 years ago
Caddymob ★ 1.0k

I have had this problem... I wrote a quick and dirty perl script to get past this. Nothing fancy, but it works. This is meant to do this by chromsome - I split these up on a cluster computer, but hopefully this gets you going.

#!/usr/bin/perl -w

$file = $ARGV[0];
$chr = $ARGV[1];

open(FILE,"<$file") || die;

while(<FILE>) {
    chomp($_);
    ($CHR,$SNP,$ZERO,$POS) = split;
    $ZERO = "0";
    if ($SNP =~ "---") {
        $SNP = "$chr:$POS";
        if (exists $snp_hash{$SNP}) {
            $snp_hash{$SNP}++;
            $SNP = $SNP . '.' . $snp_hash{$SNP};
        }
    } else {
         if (exists $snp_hash{$SNP}) {
             $snp_hash{$SNP}++;
             $newSNP = $SNP . '.' . $snp_hash{$SNP};
             $SNP = $newSNP;
         }
    }
    $snp_hash{$SNP}++;
    print "$chr\t$SNP\t$ZERO\t$POS\n";
}
ADD COMMENT
4
Entering edit mode
12.8 years ago

Do you try gtool? It can convert Impute2 output too PED/MAP format.

gtool -G --g file1 --s file2.sample --ped file3.ped --map file4.map --phenotype phenotype_1 --threshold 0.95 > output.gtool

---EDIT---

chr=1

awk -v var1=$chr '{
ORS = ""
print var1"\t"
if ($2 == "---") print "SNP."var1"."$4"\t"
else print $2"\t"
print $3"\t"
print $4"\n"
}' Chr${chr}.IMPUTE2.map > Chr${chr}.IMPUTE2.V2.map

I think this script is less complex. You can use it in a for loop with each chromosome in a separated file.

Thanks for the comment.

ADD COMMENT
1
Entering edit mode

he did use gtool, problem is that SNPs get called "---" in the map if there is no rsID#. My script above will convert these SNPs to chr:pos format to give them a uniq ID and get you through PLINK.

ADD REPLY
0
Entering edit mode
10.7 years ago
Kantale ▴ 140

Also take a look at this python implementation:

http://www.pypedia.com/index.php/Convert_impute2_gprobs_to_PEDMAP_beagle_user_Kantale

From the long parameter list, you can only define the following parameters:

chromosome, input_impute2_gprobs_filename, input_impute2_info_filename, output_TPED_filename, output_TFAM_filename

Note: The output format is transposed PED/MAP files. You can use these files directly in plink with the --tfile parameter: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr

ADD COMMENT

Login before adding your answer.

Traffic: 1684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6