I notice that the 20100804 release of the 1000 genomes genotypes has subjects that are missing from the later releases (e.g. NA06985).
Would anyone know why data on those those subjects were culled?
I notice that the 20100804 release of the 1000 genomes genotypes has subjects that are missing from the later releases (e.g. NA06985).
Would anyone know why data on those those subjects were culled?
Check the changelists on the 1000 Genomes ftp site. In fact - check the sequence.index file, it lists all the samples and has a "Withdrawn" (column 21) column to tell you if it is withdrawn, and a "withdrawn date" (column 22) column to tell you when it was removed. Once you know when, you can find a changelog that matches that date and figure out why. Generally things get withdrawn because they find they were misidentified, or contaminated or badly sequenced somehow.
Thanks for the tip which I have followed up. Those subjects were apparently "SUPPRESSED IN ARCHIVE" which has the meaning "The run has been suppressed by the submitter in the archive". Tracing back through the changelogs pointed to a 20101123sequence index change where those sequences were marked as FAILED GENOTYPE QC. It would easier if that reason were persisted.
Thanks again for the pointer above.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
No problem. You could email the info email address at 1000genomes.org to point out that maintaining that information in future would be useful to you/others. The feedback on what information users care about is valued by them.