Question

Obtain population information of 1092 sample of the 1000 genome project

0

Entering edit mode

4.8 years ago

lxiao63 • 0

I have downloaded genomic data for 1000 g phase I samples from https://www.ncbi.nlm.nih.gov/projects/faspftp/1000genomes/.

I checked the resultant .FAM file (1092 rows, each corresponds to 1 sample in 1000 g phase I release) and noted that there is a column named member whose first 20 cases are :

HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00104 HG00106 HG00108 HG00109 HG00110 HG00111 HG00112 HG00113 HG00114 HG00116 HG00117 HG00118 HG00119

I wish to determine the population (eg, CHB, JPT, CEU) and super population (eg, EAS, EUR, AFR) from the member IDs. To do so, I downloaded pedigree file from https://www.internationalgenome.org/faq/can-i-get-phenotype-gender-and-family-relationship-information-samples/.

The pedigree file has 3501 rows rather than 1092 rows. This file has a column namded Individual ID whose contents are: HG01879, HG01880, HG01881, etc. However, none of the member in my .FAM file can be found among the 3501 rows of the pedigree file! These two files are completely irrelevant.

I would like to ask if it is possible to determine population source of the 1092 1000 g samples from their member ID. If yes, where could I find such meta data that relates ID to population source?

Thank you.

1000 genome project • 937 views

ADD COMMENT • link updated 4.8 years ago by JC 13k • written 4.8 years ago by lxiao63 • 0

score 0 · Answer 1 · 2020-02-21

0

Entering edit mode

4.8 years ago

JC 13k

You can use the 1000Genomes Data portal

ADD COMMENT • link 4.8 years ago by JC 13k