What is the best source of meta-data about 1000 genomes samples (genomes)?
I can't make much sense of 201012141000genomessamples.xls, thus asking for a better source.
More specifically, I find these cases confusing:
unrel/duo and unrel/trio entries (is HG00144 the mother of HG00146 and HG00147? If yes - why? How come NA19625 is the only sample in family 2357-01, but is marked "trio"?):
HG00144 GBR SRS006877 female Mother unrel/duo
HG00146 GBR SRS006879 female Sibling unrel/duo
HG00147 GBR SRS006880 female Sibling unrel/duo
NA19625 ASW SRS003634 female child unrel/trio
"not father" cases - is that when biological father is not the husband? Then why is he listed under that family ID?
NA18510 YRI SRS000103 Y010-03 male not father unrel
type=unrel for clearly related samples:
NA11932 CEU SRS001261 1424-13 male mat grandfather unrel
NA11933 CEU SRS001262 1424-14 female mat grandmother unrel
To the example above, two more unclarities: a) do the numbers after dash (as in 1424-13, 1424-14) matter?, and b) what is the purpose of replacing father
and mother
with (maternal|paternal) grand(father|mather)
, if there is no way to link these grandparents to their children and grand-children?
Also, what would addnl related
mean in the unrel/duo/trio column?
Ok, I've figured one out:
trios with no obvious "third one", like in family SH071 (there are many of these):
Here, the problem is that only parents were sequenced, with children likely postponed to a later phase. That is why there are many "trios" with only parents present.
At least for the 1424- family, "unrel" is explained by the fact that only "father" with his grandparents are related; as neither mother nor children are listed, maternal grandparents can be viewed as unrelated. This also partially answers my question b) under the 1424- example.
Many collections are trios as while for the low coverage sequencing we are only sequencing unrelated individuals the presence of the child from the trio will be useful for validation purposes