I would like to reproduce some published results with my own analysis pipeline, but I need the corresponding datasets downloadable. I have to validate my pipeline. If someone has another idea... Let me know!!
Orc@
I would like to reproduce some published results with my own analysis pipeline, but I need the corresponding datasets downloadable. I have to validate my pipeline. If someone has another idea... Let me know!!
Orc@
From the NCBI Sequence Read Archive. To obtain Fastq format see the relevant section in the SRA handbook for which you will probably need the SRA Toolkit.
You could also look into the European Nucleotide Archive at the EBI.
NOTE: Shameless plug for our software....
You could have a look at the SRAdb R/Bioconductor package. We pull down all the metadata from the sequence read archives at EBI, NCBI, and DDBJ and consolidate that into a SQLite file that can be used from R or any other language that has a SQLite interface.
If you are looking for non-human sequences you can use the European Nucleotide Archive at EBI. But if the papers that you are looking at, are from humans, then you need to go to the European Genotype Phenotype Archive at EBI EGA or the datatabase of Genotypes and Phenotypes at NCBI dbGaP. Be aware that despite of being public, almost all the human data from research studies are under consent agreement rules, so you need to ask for access first before able to access or use the data. Unless you want to replicate a study, it would be easy to use data from 1000 genomes or similar projects that you can download directly from the 1kg web site.
As a side note: for downloading this BIG data sets is better to use aspera than ftp when this possibility is provided (see 1kg data access)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What kind of analysis are you trying to perform?
same question here: I'd like to find a set of fastq files related to a given article to show my students how to process this kind of data.
Thanks for your answers, but I'm looking for an article (published results) in which the raw data are available.
I would like to perform some analysis of Ins/Del, SNP on human genome.
I have a question. Since it is very related I chose to post it here.
Q: So when you say that you want to validate your analysis pipeline with published/publicaly available data. Do you get any rights to publish your results based on your analysis of somebody elses's data. What are the norms to use publicaly available NGS data?