I am looking to analyze an experiment that deals with samples from related people. The sequencing was outsourced and the experiment's variant calling step seems to involve the 17 platinum genomes.
I would like to gain a deeper understanding of the benefit of using the platinum genomes in my experiment. In other words, what have I gained by including this data in my experiment? How best can I leverage these results downstream?
Thank you in advance for your inputs!
What do you mean by: "The sequencing was outsourced and the experiment's variant calling step seems to involve the 17 platinum genomes."?
We had a different institution do the sequencing and bioinformatics for us, because it was cheaper than in-house. The variant calling part of their workflow involves our samples + the 17 platinum genomes.
Zev may be asking what you mean by "the 17 platinum genomes". I have never heard of that before. Google suggests it may be some related human genomes that Illumina sequenced to 30x-200x; is that correct?
As for gaining things... well, what is your experiment? Using deeply-sequenced related individuals can be very useful for calibrating variant-calling software; I used that approach in the past to demonstrate that a certain company's indel calls were completely bogus, because were impossible to explain by heredity. 30x is not deep sequencing by any standard, and it seems like various different library types were used for different members, so each has person has different biases... this is very bad experimental design for calibrating software. It's almost certainly better than completely uncalibrated software, but not something I would advertize.
Anyway, I can't think of any use for a single set of random related genomes other than for calibration, and this set does not seem to be a very good choice for that, assuming we are talking about the same thing.
I hadn't heard of the platinum genomes before I saw these experiment results either. Sure, I had seen occurrences of "NA12878" in the wild quite a few times, but never thought it'd be a part of some set of high quality genomes.
Yes, these 17 are Illumina's CEPH pedigree genomes that have been extensively studied for variant calling.
I wonder why Illumina would use anything but the best of protocols to create a "platinum" data set. Anyway, I guess I'll ask the vendor why they used the set and if we would benefit from it.