What whole genome datasets would you prefer to be used for benchmarking purposes (such as for benchmarking aligners, callers, etc.)?
While NA12878 (as well as NA12891 and NA12892) have multiple datasets available and have been used extensively for benchmarking I wanted to see if the community had recommendations for other whole genome datasets that may have been sequenced using more start-of-the-art technology. Please also provide the url to the dataset(s), if available. Thanks!
Greatly appreciate reminding me about the Platinum Genomes from Illumina - I've accessed them and will be utilizing the NA12878 FASTQs for benchmarking.