Hi all,
I'm about to start working with the data available on Impute. My collaborator is currently away from is mail.
From this page, I downloaded 1kG_b36_aug09_ceu.tgz (132Mo)
this archive contains 'legend' file, containing a list of SNP/position/Allele-0/Allele-1
:
head 1kG_b36_aug09_ceu_chr10.legend
rsID position a0 a1
rs61838558 54767 C T
rs28887774 55878 C G
rs12262442 56397 C T
rs4121579 56695 T A
10-57163 57163 G A
rs9943471 57774 G C
rs35819232 58533 T G
rs11253482 58575 C T
rs34829118 59071 G A
the second type of file is the hap file
head -n2 1kG_b36_aug09_ceu_chr10.hap
1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
here, the number of rows is the number of SNPs in the legend file. Ok.
But I wonder how those 112 columns should be read. It should be a something about the haplotypes for each samples (where can I get the pedigree ?) but what does it mean ? should I read each pair of numbers to get the state of the current snp of both chromosome or is it something else ?
Thank you for your help
Pierre
many thanks Lars :-) As usual, I should have RTFM :-)