Question

Subjects In 1000 Genome Removed In Recent Releases

2

Entering edit mode

12.2 years ago

David Huen ▴ 20

I notice that the 20100804 release of the 1000 genomes genotypes has subjects that are missing from the later releases (e.g. NA06985).

Would anyone know why data on those those subjects were culled?

genotyping 1000genomes • 2.1k views

ADD COMMENT • link 12.2 years ago by David Huen ▴ 20

score 1 · Answer 1 · 2012-09-26

Check the changelists on the 1000 Genomes ftp site. In fact - check the sequence.index file, it lists all the samples and has a "Withdrawn" (column 21) column to tell you if it is withdrawn, and a "withdrawn date" (column 22) column to tell you when it was removed. Once you know when, you can find a changelog that matches that date and figure out why. Generally things get withdrawn because they find they were misidentified, or contaminated or badly sequenced somehow.

score 0 · Answer 2 · 2012-09-26

0

Entering edit mode

12.2 years ago

David Huen ▴ 20

Thanks for the tip which I have followed up. Those subjects were apparently "SUPPRESSED IN ARCHIVE" which has the meaning "The run has been suppressed by the submitter in the archive". Tracing back through the changelogs pointed to a 20101123sequence index change where those sequences were marked as FAILED GENOTYPE QC. It would easier if that reason were persisted.

Thanks again for the pointer above.

ADD COMMENT • link 12.2 years ago by David Huen ▴ 20

0

Entering edit mode

No problem. You could email the info email address at 1000genomes.org to point out that maintaining that information in future would be useful to you/others. The feedback on what information users care about is valued by them.

ADD REPLY • link 12.2 years ago by zam.iqbal.genome ★ 1.9k