Convert Plink Ped Format Into Hapmap Format?
3
4
Entering edit mode
13.1 years ago

Is there a way to convert from ped PLINK format into HapMap genotype format? I've got PED PLINK files that I want to analyse with a program that requires HapMap genotype format (the examples below are from HapMap, so I am guessing it's doable to convert them):

From this example file ([...] means more of the same):

wget -qO- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/plink_format/hapmap3_r2_b36_fwd.consensus.qc.poly.ped.bz2 | bunzip2 -c | head -n 1 | less

2427    NA19919 NA19908 NA19909 1       -9      C C     C C     C C     A A     G G     G G     C C     T T     G G   [...]   A A

To this file format ([...] means more of the same):

wget -qO- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/hapmap_format/consensus/genotypes_chr10_ASW_phase3.2_consensus.b36_fwd.txt.gz | gunzip -c | head

center protLSID assayLSID panelLSID QCcode NA19625 NA19700 NA19701 NA19702 NA19703 NA19704 [...] NA20364
rs12255619 A/C chr10 88481 + ncbi_b36 bbs urn:lsid:bbs.hapmap.org:Protocol:Phase3_Draft1:1 urn:lsid:bbs.hapmap.org:Assay:Phase3_Draft1_rs12255619:1 urn:lsid:dcc.hapmap.org:Panel:US_African-30-trios:3 QC+ AA AA AC AA AA AA [...] AA

For more details on the HapMap format:

-HapMap file format:
The current release consists of text-table files only, with the following columns:

Col1: refSNP rs# identifier at the time of release (NB: it might merge 
  with another rs# in the future)
Col2: SNP alleles according to dbSNP
Col3: chromosome that SNP maps to 
Col4: chromosome position of SNP, in basepairs on reference sequence
Col5: strand of reference sequence that SNP maps to
Col6: version of reference sequence assembly (currently NCBI build36)
Col7: HapMap genotyping center that produced the genotypes
Col8: LSID for HapMap protocol used for genotyping
Col9: LSID for HapMap assay used for genotyping
Col10: LSID for panel of individuals genotyped
Col11: QC-code, currently 'QC+' for all entries (for future use)
Col12 and on: observed genotypes of samples, one per column, sample
     identifiers in column headers (Coriell catalog numbers, example:
      NA10847).
hapmap genotyping plink • 16k views
ADD COMMENT
0
Entering edit mode

Hi Pierre,

Have you ever try paraHaplo software? I would like to use it because of its capabality to multiprocessing with MPI. I have used your script converttpedto_hapmap.pl to convert my tped data file to hapmap format but I meet a problem.

When I'm trying to run the haplotypePhasing module with command "../../src/SNP/bin/haplotypePhasing.exe ../../data/test.tped.hapmap hblockdef.txt phasedout_test.txt 1 9331"

I have got such error "input file open error! FileName:phasedouttest.txt_output0"

Unfortunately I dont know what could be a real problem. Maybe it is paraHaplo problem or hapMap format generated by your scritp. So, I wondering if you have any experience with paraHaplo?

ADD REPLY
0
Entering edit mode

The files could also come from a vcf source:

~/src/tabix-0.2.6/bgzip -c test.100000.vcf > test.100000.vcf.gz ~/src/vcftools_0.1.8a/bin/vcftools --gzvcf test.100000.vcf.gz --plink-tped

Then converting the tped files into hapmap could be done with the answers in this biostar question.

ADD REPLY
5
Entering edit mode
13.1 years ago
Davy ▴ 210

Python and R will allow to transpose your data set. very easy in R

data <- t(data)

Not sure about python. Depends on the data structures your using but let's say an array, then

data = transpose(data)

Plink also gives you an option to transpose your ped files.

plink --file mydata --transpose --out mytransposeddata

That gives you snps in rows and individuals in columns.You can then just add in the SNP, chromosome, and bp columns from the map file that plink uses.

Again, if you are using R you can use biomaRt to look up the information of all the SNPs you have in your file. If you don't know R or biomaRt, UCSC genome browser would be a good bet. It can give you tons of infromation on a predefined list of SNPs, or even all SNPs within specified regions.

Some columns you will have to fill in yourself like the protocols and assays, but assuming that that wouldn't be an extensive list, it is still pretty easy in R.

Hope this helps somehow.

ADD COMMENT
5
Entering edit mode
13.1 years ago

Hello,

I have made a lot of file format conversion in the past between ped/HapMap/impute/beagle etc, and have a lot of code for this. I made scripts in the past that dealed with the transposition themselves but once plink became the standard tool for GWAS y delegate the transposition ped/tped to plink and then use ped or tped for the format conversion depending the orientation of the destination format.

For converting ped to HapMap I convert first to tped with plink:

plink --noweb --file hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds --recode --transpose --out hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds`

And then use my script convert_tped_to_hapmap.pl.

perl /soft_bio/PMG/perl/convert_tped_to_hapmap.pl --tped hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.tped --tfam hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.tfam --build=ncbi_36

I have made a gist where you can download the script


    
      
        
  
    
    

        


  


  
        
          
          #!/usr/bin/env perl
        
        
          
          

        
        
          
          

        
        
          
          =head1 [progam_name]
        
        
          
          

        
        
          
           description:
        
        
          
             - converts tped files to hapmap files.
        
        
          
          

        
        
          
          

        
        
          
          =head2 usage
        
        
          
          

        
        
          
            # convert tped to HapMap
        
        
          
            perl /soft_bio/PMG/perl/convert_tped_to_hapmap.pl --tped hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.tped --tfam hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.tfam --build=ncbi_36
        
        
          
          

        
        
          
           # See documentation
        
        
          
           perldoc convert_tped_to_hapmap.pl
        
        
          
          

        
        
          
          

        
        
          
          =head2 comments
        
        
          
          

        
        
          
          Pedigree files used by plink (.ped) are individual oriented (individuals in rows) but HapMap are SNP oriented (SNPs in rows). For converting from any SNP oriented format from or to plink, the best approach is to use tped (transposed ped) and then use plink to transpose to/from ped.
        
        
          
          

        
        
          
          

        
        
          
          For converting from ped to tped:
        
        
          
          

        
        
          
            - Download 100 individuals from hapampIII in ped format
        
        
          
            wget -qO- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/plink_format/hapmap3_r2_b36_fwd.consensus.qc.poly.ped.bz2 | bunzip2 -c | head -n 100 > hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.ped
        
        
          
          

        
        
          
            - Download the map file
        
        
          
            wget -qO- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/plink_format/hapmap3_r2_b36_fwd.consensus.qc.poly.map.bz2 |  bunzip2 -c > hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.map
        
        
          
          

        
        
          
            # the .map has the '--100_inds' flag only to have the same base name that the .ped file.
        
        
          
          

        
        
          
          

        
        
          
            - convert to tped
        
        
          
             plink --noweb --file hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds --recode --transpose --out hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds
        
        
          
          

        
        
          
            - convert tped to hapmap
        
        
          
            perl /soft_bio/PMG/perl/convert_tped_to_hapmap.pl --tped hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.tped --tfam hapmap3_r2_b36_fwd.consensus.qc.poly--100_inds.tfam --build=ncbi_36
        
        
          
          

        
        
          
          

        
        
          
          =head2 file type description
        
        
          
          

        
        
          
            tped files:
        
        
          
          

        
        
          
               - 4 columns (like map)
        
        
          
               - 5th until end => genotypes (two columns per ind). unknown = 0 or N
        
        
          
          

        
        
          
            tfam file
        
        
          
          

        
        
          
                Family ID
        
        
          
                Individual ID
        
        
          
                Paternal ID
        
        
          
                Maternal ID
        
        
          
                Sex (1=male; 2=female; other=unknown)
        
        
          
                Phenotype
        
        
          
          

        
        
          
          

        
        
          
            map
        
        
          
             - 4 columns
        
        
          
               - chr
        
        
          
               - snp_name
        
        
          
               - cM
        
        
          
               - chr_pos
        
        
          
          

        
        
          
          

        
        
          
           hapmap_files
        
        
          
          

        
        
          
                  -HapMap file format:
        
        
          
                  The current release consists of text-table files only, with the following columns:
        
        
          
          

        
        
          
                  Col1: refSNP rs# identifier at the time of release (NB: it might merge
        
        
          
                    with another rs# in the future)
        
        
          
                  Col2: SNP alleles according to dbSNP
        
        
          
                  Col3: chromosome that SNP maps to
        
        
          
                  Col4: chromosome position of SNP, in basepairs on reference sequence
        
        
          
                  Col5: strand of reference sequence that SNP maps to
        
        
          
                  Col6: version of reference sequence assembly (currently NCBI build36)
        
        
          
                  Col7: HapMap genotyping center that produced the genotypes
        
        
          
                  Col8: LSID for HapMap protocol used for genotyping
        
        
          
                  Col9: LSID for HapMap assay used for genotyping
        
        
          
                  Col10: LSID for panel of individuals genotyped
        
        
          
                  Col11: QC-code, currently 'QC+' for all entries (for future use)
        
        
          
                  Col12 and on: observed genotypes of samples, one per column, sample
        
        
          
                       identifiers in column headers (Coriell catalog numbers, example:
        
        
          
                        NA10847).
        
        
          
          

        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          #use feature ':5.10';
        
        
          
          use strict;
        
        
          
          #use warnings;
        
        
          
          #use Data::Dumper;
        
        
          
          

        
        
          
          use Getopt::Long;
        
        
          
          

        
        
          
          my $prog = $0;
        
        
          
          my $usage = <<EOQ;
        
        
          
          Usage for $0:
        
        
          
          

        
        
          
            >$prog [-test -help -verbose] --tped xx  --tfam yy --hapmap zz --build=ncbi_36 
        
        
          
          

        
        
          
          

        
        
          
            optional
        
        
          
            --------
        
        
          
          

        
        
          
            --panel=ncbi_36
        
        
          
            --strand_all_snps=+
        
        
          
          

        
        
          
          EOQ
        
        
          
          

        
        
          
          my $help;
        
        
          
          my $test;
        
        
          
          my $debug;
        
        
          
          my $verbose =1;
        
        
          
          

        
        
          
          my $bsub;
        
        
          
          my $log;
        
        
          
          my $stdout;
        
        
          
          

        
        
          
          my $file_tped;
        
        
          
          my $file_tfam;
        
        
          
          my $file_hapmap;
        
        
          
          my $file_pheno;
        
        
          
          my $file_map;
        
        
          
          my $build;
        
        
          
          my $strand_all_snps;
        
        
          
          my $panel;
        
        
          
          

        
        
          
          my $ok = GetOptions(
        
        
          
                              'test'              => \$test,
        
        
          
                              'debug:i'           => \$debug,
        
        
          
                              'verbose:i'         => \$verbose,
        
        
          
                              'h|help'            => \$help,
        
        
          
                              'log'               => \$log,
        
        
          
                              'bsub'              => \$bsub,
        
        
          
                              'stdout'            => \$stdout,
        
        
          
                              'tped=s'            => \$file_tped,
        
        
          
                              'tfam=s'            => \$file_tfam,
        
        
          
                              'hapmap=s'          => \$file_hapmap,
        
        
          
                              'pheno=s'           => \$file_pheno,
        
        
          
                              'map=s'             => \$file_map,
        
        
          
                              'strand_all_snps=s' => \$strand_all_snps,
        
        
          
                              'build=s'           => \$build,
        
        
          
                              'panel=s'           => \$panel,
        
        
          
                             );
        
        
          
          

        
        
          
          if ($help || !$ok ) {
        
        
          
              print $usage;
        
        
          
              exit;
        
        
          
          }
        
        
          
          

        
        
          
          unless ($build) {
        
        
          
              print "#[ERROR] Sorry but you need to document your hapmap genotypes with a genome build\n";
        
        
          
          

        
        
          
              print $usage;
        
        
          
              exit;
        
        
          
          }
        
        
          
          

        
        
          
          if (! $file_tped || ! $file_tfam ) {
        
        
          
              print "#[ERROR] Sorry but you need a tped and tfam file\n";
        
        
          
          

        
        
          
              print $usage;
        
        
          
              exit;
        
        
          
          }
        
        
          
          

        
        
          
          $file_hapmap ||=  $file_tped.'.hapmap';
        
        
          
          my $log_file  = $file_hapmap.'.log';
        
        
          
          

        
        
          
          # variable to store the log
        
        
          
          my $log_str;
        
        
          
          

        
        
          
          start_log("conversion ped to hapmap");
        
        
          
          

        
        
          
          main();
        
        
          
          

        
        
          
          end_log();
        
        
          
          

        
        
          
          sub main {
        
        
          
          

        
        
          
              # read each column from tped and convert to hapmap
        
        
          
          

        
        
          
              $DB::single=1;
        
        
          
              feedback_log("#[MSG] loading tfam file '$file_tfam'\n");
        
        
          
              my $tfam = load_tfam($file_tfam);
        
        
          
          

        
        
          
              $DB::single=1;
        
        
          
              open (my $tped_fh,   '<', $file_tped)   or die "Sorry unable to open tped file '$file_tped' for reading:$!\n";
        
        
          
              open (my $hapmap_fh, '>', $file_hapmap) or die "Sorry unable to open hapmap file '$file_hapmap' for writing:$!\n";
        
        
          
          

        
        
          
          

        
        
          
              # print the header of hapmap
        
        
          
              $DB::single=1;
        
        
          
              # expect ind ids to be unique, if you don't whant the id  merged with family id
        
        
          
              # just in case I am joining FID__IID iwth double underscore so it can be easily splitted
        
        
          
              my $inds = [map {$_->[0].'__'.$_->[1]} @$tfam];  # <TODO> CHECK
        
        
          
              print {$hapmap_fh} join("\t",qw(rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode), @$inds),"\n";
        
        
          
          

        
        
          
          

        
        
          
              feedback_log("#[MSG] processing tped file '$file_tped'.\n");
        
        
          
              while (my $line=<$tped_fh>) {
        
        
          
                  chomp $line;
        
        
          
          

        
        
          
                  # print a '.' every 500 SNPs
        
        
          
                  print {*STDERR} '.' unless ($. % 500);
        
        
          
          

        
        
          
                  my @items = split /\s+/, $line;
        
        
          
                  my $dat= {
        
        
          
                            '01_name'        => $items[1],
        
        
          
                            '02_alleles'     => get_alleles(\@items),
        
        
          
                            '03_chr'         => $items[0],
        
        
          
                            '04_pos'         => $items[3],
        
        
          
                            '05_strand'      => $strand_all_snps?  $strand_all_snps : '.',
        
        
          
                            '06_build'       => $build,
        
        
          
                            '07_geno_center' => '-',
        
        
          
                            '08_protocol'    => '-',
        
        
          
                            '09_assay'       => '-',
        
        
          
                            '10_panel'       => $panel? $panel:'-',
        
        
          
                            '11_qc_code'     => 'QC+',
        
        
          
                            '12_geno_str'    => get_geno_string(\@items, $inds),
        
        
          
                           };
        
        
          
                  $DB::single=1;
        
        
          
                  print {$hapmap_fh} join ("\t" , map{$dat->{$_}} sort keys %$dat). "\n";
        
        
          
          

        
        
          
              }
        
        
          
          

        
        
          
          

        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          

        
        
          
          =head2 get_geno_string
        
        
          
          

        
        
          
           Title   : get_geno_string
        
        
          
           Usage   :
        
        
          
           Function:
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    :
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub get_geno_string{
        
        
          
              my ($items, $inds) = @_;
        
        
          
          

        
        
          
              $DB::single=1;
        
        
          
              my @genos =  @$items[4..(@$items-1)];
        
        
          
              if (@genos % 2 ) {
        
        
          
                  # more columns that expected
        
        
          
                  die "#[ERROR] columns not multiplo of 2 in parsed ped file ' $file_tped' at line $.";
        
        
          
              }
        
        
          
              if (@genos/2 != @$inds) {
        
        
          
                  # error more geno than inds
        
        
          
                  die "#[ERROR]  different number of individuals and genotypes in parsed ped file ' $file_tped' at line $.: genos=".(@genos/2)." inds=".@$inds."\n";
        
        
          
              }
        
        
          
              
        
        
          
              return join ("\t", map {
        
        
          
                                       my $idx=$_*2;
        
        
          
                                       $genos[$idx+0].$genos[$idx+1]
        
        
          
                                   } 0..((@genos/2)-1)
        
        
          
                           )
        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          =head2 get_alleles
        
        
          
          

        
        
          
           Title   : get_alleles
        
        
          
           Usage   :
        
        
          
           Function:
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    :
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub get_alleles{
        
        
          
              my ($items) = @_;
        
        
          
              my %h; 
        
        
          
              map{$h{$_}++} @$items[4..(@$items-1)];
        
        
          
              return join('/',sort keys %h);
        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          =head2 write_log
        
        
          
          

        
        
          
           Title   : write_log
        
        
          
           Usage   :
        
        
          
           Function:
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    : $log_str   str with the message to log
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub write_log{
        
        
          
              my ($log_str) = @_;
        
        
          
          

        
        
          
              write_file($log_file, $log_str);
        
        
          
          

        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          =head2 feedback_log
        
        
          
          

        
        
          
           Title   : feedback_log
        
        
          
           Usage   :
        
        
          
           Function:  print to stderr and add tothe log_str the message. (The str should contain the \n if needed)
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    : $str = string
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub feedback_log{
        
        
          
              my ($str) = @_;
        
        
          
          

        
        
          
              print {*STDERR} $str;
        
        
          
          

        
        
          
              $log_str .= $str;
        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          =head2 start_log
        
        
          
          

        
        
          
           Title   : start_log
        
        
          
           Usage   :
        
        
          
           Function: start log file with the name of the files
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    : an extra message to add to the log starting
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub start_log{
        
        
          
              my ($title, $extra_msg) = @_;
        
        
          
          

        
        
          
              my $date = localtime;
        
        
          
          

        
        
          
          

        
        
          
              $title ||= "$0";
        
        
          
              my $out = "== LOG $title ==\n";
        
        
          
              $out .= 'Started at: ' . $date . "\n";
        
        
          
              $out .= "  hapmap file   : " . $file_hapmap . "\n";
        
        
          
              $out .= "  tped file  : " . $file_tped . "\n";
        
        
          
              $out .= "  tfam file  : " . $file_tfam . "\n";
        
        
          
              $out .= "  map file   : " . $file_map . "\n";
        
        
          
              $out .= "  pheno file : " . $file_pheno . "\n";
        
        
          
              $out .= "\n";
        
        
          
              $out .= $extra_msg;
        
        
          
          

        
        
          
              feedback_log($out);
        
        
          
          

        
        
          
          }
        
        
          
          

        
        
          
          =head2 end_log
        
        
          
          

        
        
          
           Title   : end_log
        
        
          
           Usage   :
        
        
          
           Function:
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    :
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub end_log{
        
        
          
              my ($extra_msg) = @_;
        
        
          
          

        
        
          
              my $date = localtime;
        
        
          
          

        
        
          
              my $out = 'ENDED at: ' . $date . "\n";
        
        
          
              $out .= $extra_msg;
        
        
          
              $out .= "== END ==\n";
        
        
          
              feedback_log($out);
        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          

        
        
          
          =head2 load_pheno_file
        
        
          
          

        
        
          
           Title   : load_pheno_file
        
        
          
           Usage   :
        
        
          
           Function:
        
        
          
           Example :
        
        
          
           Returns : a hash with the IID as key and values. The two first columns should be FID and IID and the third is 'sex'
        
        
          
           Args    :
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub load_pheno_file{
        
        
          
              my ($file) = @_;
        
        
          
          

        
        
          
              my $pheno_hash = read_table_to_hash($file);
        
        
          
          

        
        
          
              return $pheno_hash;
        
        
          
          

        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          =head2 load_tfam
        
        
          
          

        
        
          
           Title   : load_tfam
        
        
          
           Usage   :
        
        
          
           Function:
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    :
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub load_tfam{
        
        
          
              my ($file_tfam) = @_;
        
        
          
          

        
        
          
              my $rows = read_table($file_tfam, 0);
        
        
          
          

        
        
          
          

        
        
          
              return $rows;
        
        
          
          

        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          =head2 read_table
        
        
          
          

        
        
          
           Title   : read_table
        
        
          
           Usage   :
        
        
          
           Function: parse files separated with tabs and bare columns (no quotes nor tabs to scape)
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    : $file
        
        
          
                     $has_header
        
        
          
          

        
        
          
          

        
        
          
            <TO_DO> think if we want to capture headers and return them??
        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub read_table{
        
        
          
              my ($file, $has_header) = @_;
        
        
          
          

        
        
          
              die "#[ERROR] no argument passed to sub read_table\n" unless $file;
        
        
          
              unless (-e $file) {
        
        
          
                  die "#[ERROR] Sorry but the file '$file' does not exist.\n";
        
        
          
              }
        
        
          
          

        
        
          
              open (my $filefh, '<',$file) or die "#[ERROR] unable to open file '$file' for reading:$!\n";
        
        
          
          

        
        
          
          

        
        
          
              my $count;
        
        
          
              my @rows;
        
        
          
              while( my $line=<$filefh>) {
        
        
          
          

        
        
          
                  next if $.==1 && $has_header;
        
        
          
          

        
        
          
                  chomp $line;
        
        
          
                  $count ++;
        
        
          
          

        
        
          
                  my @items = split /\s+/, $line;
        
        
          
                  push @rows, [@items];
        
        
          
          

        
        
          
              }
        
        
          
              feedback_log( "...end. $count lines processed\n");
        
        
          
          

        
        
          
              return \@rows;
        
        
          
          }
        
        
          
          

        
        
          
          =head2 read_table_to_hash
        
        
          
          

        
        
          
           Title   : read_table_to_hash
        
        
          
           Usage   :
        
        
          
           Function: creates a hash of hashes. A hash with keys the first column
        
        
          
               and values being another hash constructed with all columns ad keys
        
        
          
               taken from the header
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    :
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub read_table_to_hash{
        
        
          
              my ($file) = @_;
        
        
          
          

        
        
          
              # a header is mandatory
        
        
          
          

        
        
          
              die "#[ERROR] no argument passed to sub read_table\n" unless $file;
        
        
          
              unless (-e $file) {
        
        
          
                  die "#[ERROR] Sorry but the file '$file' does not exist.\n";
        
        
          
              }
        
        
          
          

        
        
          
              open (my $filefh, '<',$file) or die "unable to open $file_map for reading:$!\n";
        
        
          
          

        
        
          
              my @headers;
        
        
          
              my $count;
        
        
          
              my %hash;
        
        
          
              my $hc;
        
        
          
              while (my $line=<$filefh>) {
        
        
          
          

        
        
          
                  chomp $line;
        
        
          
                  $count ++;
        
        
          
          

        
        
          
                  my @items = split "\t", $line;
        
        
          
          

        
        
          
                  if ($.==1){
        
        
          
                      @headers = @items;
        
        
          
                      $hc = @headers;
        
        
          
                  }
        
        
          
                  else {
        
        
          
                      my $rc = @items;
        
        
          
                      die "#[ERROR] Sorry but line '$.' has wrong number of columns. header=$hc, this_row=$rc\n" unless $hc==$rc;
        
        
          
                      @{$hash{$items[0]}}{map{lc}@headers}=@items;
        
        
          
                  }
        
        
          
              }
        
        
          
              feedback_log( ".. $count lines processed for $file\n");
        
        
          
          

        
        
          
              return \%hash;
        
        
          
          }
        
        
          
          

        
        
          
          

        
        
          
          

        
        
          
          =head2 final_qc
        
        
          
          

        
        
          
           Title   : final_qc
        
        
          
           Usage   :
        
        
          
           Function:
        
        
          
           Example :
        
        
          
           Returns :
        
        
          
           Args    :
        
        
          
          

        
        
          
          

        
        
          
          =cut
        
        
          
          

        
        
          
          sub final_qc{
        
        
          
              my ($ind_ids,  $pheno_hash,  $map_hash) = @_;
        
        
          
          

        
        
          
          

        
        
          
          }
        
  



    

  


      
      
        view raw
        
          convert_tped_to_hapamp.pl
        
        hosted with ❤ by GitHub
      
    


git://gist.github.com/2069211.git

exceute perldoc <path to the script> to see the documentation and how to download the files for testing it.

This script contains the basics of the conversion (I have removed some SNP QC checking and the strand qc that connects to my own databases.)

NOTE: For converting to HapMap I will suggest first to flip the SNPs in ped to forward strand, so you can put all SNPs in the HapMap with 'strand +', and also you need to decide how to create the individual name. Usually you will have unique names for your whole study so you don't need to concatenate FamilyID and IndividualID. When working on studies with unrelated individuals when all IID are unique, sometimes the FID is equal to the FID and if you concatenate them, the name in HapMap becomes a ugly tandem duplication. For this general conversion tool, I am concatenating FID and IID with '__' (double underscore).

ADD COMMENT
2
Entering edit mode
13.1 years ago

Ok, since nobody has answered this question I suppose that there is no specific for this. Here is how I would execute this task.

Use a key/value datastore (leveldb, berkeleydb, etc...), and fill the following tables.

once your tables are filled:

for(col-index:  the markers)
    for(indi-name: individuals)
         print get(indi-name).genotypes[col-index];
ADD COMMENT

Login before adding your answer.

Traffic: 1854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6