Hi all, Simple question here: Has anyone heard or seen anything about the SEQC data set? http://www.fda.gov/ScienceResearch/BioinformaticsTools/MicroarrayQualityControlProject/default.htm#MAQC-IIIalsoknownasSEQC
Description from the link above:
The third phase of the MAQC project (MAQC-III), also called Sequencing Quality Control (SEQC), aims at assessing the technical performance of next-generation sequencing platforms by generating benchmark datasets with reference samples and evaluating advantages and limitations of various bioinformatics strategies in RNA and DNA analyses.
Sounds very useful, but I can't seem to find any data related to this project. Any ideas?
EDIT: It sounds like the MAQC phase 1 or phase 2 release contained 1000 genes validated with qPCR. Does anyone knows how to access that data ?
Thanks Mikael. The paper references the SEQC dataset, but all I can find in the link are the hiSeq sequences rather than the expected values from the control data set. Any ideas?
I kept on looking and found this repository: https://bitbucket.org/soccin/seqc/src/3c971e74c9a5df35f87880914e5168767870037c/data?at=master It corresponds to the Rapaport et al differential expression benchmarking paper. (http://genomebiology.com/content/14/9/R95#B37) You should be able to find what you need there.
I think the expected values (=qRT-PCR data) are here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5350
See this document for more info ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE5350/GSE5350_Summary_MAQC_DataSets.pdf
In the link provided by Mikael, you can search any items with the tag TAQ e.g. MAQC_TAQ_1_A1, those are the Taqman rtPCR data and those were used as a reference to the "true" expression of those 1000+ genes. However, this dataset was used for the microarray data. So it is not sure if the A1 correspond to the same A1 in the RNA Sequencing experiment.
Thanks for the clarification.