What I cannot find are instructions on how to access or generate the same FASTQ files. These datasets seem to be quite essential for benchmarking purposes, but I am not sure what is the best way to gather them.
We are providing deep whole genome sequence data for the CEPH 1463 family in order to create a "platinum" standard comprehensive set of variant calls. These genomes include a trio (NA12877 NA12878 and NA12882) sequenced to greater than 200x depth of coverage, as well as a technical replicate (separate library and sequencing, but same DNA sample) of NA12882 also sequenced to greater than 200x. Additional information and analyses will be provided at www.platinumgenomes.org.
SRA explorer returns several projects on SRA that have the WGS raw data for NA12878. You could type NA12878 in the search box and add the desired results to the collection, then you get direct links to the fastq files from the save datasets button at the top of the page.
However, to identify which one provides the 30x dataset you may have to check the number of reads in the last column or go to the project home page on NCBI (just click on the accession in the second column of the result page).
I'm not sure about the naming/ordering of the folders in that directory. All the folders I checked contained at least alignments (.bam files) - you could always convert those to FASTAs if you want to re-align or something. Some of the folders do contain actual fastq/fasta files like Garvan_NA12878_HG001_HiSeq_Exome. Also, the paper you linked specifically mentioned analyzing the BAM files (which makes sense) - not FASTQs.
The New York Genome Center (NYGC), funded by NHGRI, has sequenced 3202 samples from the 1000 Genomes Project sample collection to 30x coverage. Initially, the 2504 unrelated samples from the phase three panel from the 1000 Genomes Project were sequenced. Thereafter, an additional 698 samples, related to samples in the 2504 panel, were also sequenced. NYGC aligned the data to GRCh38 and those alignments are publicly available along with a data reuse statement. Details, including URLs for the data in ENA, are in our data portal (below) and are listed on our FTP site.
Already tried those. Try to get the high coverage and will see the downloaded file doesn't make any sense for high coverage...
What does this mean?