Subjects In 1000 Genome Removed In Recent Releases
2
2
Entering edit mode
12.2 years ago
David Huen ▴ 20

I notice that the 20100804 release of the 1000 genomes genotypes has subjects that are missing from the later releases (e.g. NA06985).

Would anyone know why data on those those subjects were culled?

genotyping 1000genomes • 2.1k views
ADD COMMENT
1
Entering edit mode
12.2 years ago

Check the changelists on the 1000 Genomes ftp site. In fact - check the sequence.index file, it lists all the samples and has a "Withdrawn" (column 21) column to tell you if it is withdrawn, and a "withdrawn date" (column 22) column to tell you when it was removed. Once you know when, you can find a changelog that matches that date and figure out why. Generally things get withdrawn because they find they were misidentified, or contaminated or badly sequenced somehow.

ADD COMMENT
0
Entering edit mode
12.2 years ago
David Huen ▴ 20

Thanks for the tip which I have followed up. Those subjects were apparently "SUPPRESSED IN ARCHIVE" which has the meaning "The run has been suppressed by the submitter in the archive". Tracing back through the changelogs pointed to a 20101123sequence index change where those sequences were marked as FAILED GENOTYPE QC. It would easier if that reason were persisted.

Thanks again for the pointer above.

ADD COMMENT
0
Entering edit mode

No problem. You could email the info email address at 1000genomes.org to point out that maintaining that information in future would be useful to you/others. The feedback on what information users care about is valued by them.

ADD REPLY

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6