Question

Where Can I Find Fastq Data (Ngs Raw Data) And Published Results?

10

Entering edit mode

14.1 years ago

Orca ▴ 140

I would like to reproduce some published results with my own analysis pipeline, but I need the corresponding datasets downloadable. I have to validate my pipeline. If someone has another idea... Let me know!!

Orc@

next-gen sequencing data analysis • 17k views

ADD COMMENT • link updated 13.9 years ago by Pablo Marin-Garcia ★ 2.0k • written 14.1 years ago by Orca ▴ 140

0

Entering edit mode

What kind of analysis are you trying to perform?

ADD REPLY • link 14.1 years ago by Jts ★ 1.4k

0

Entering edit mode

same question here: I'd like to find a set of fastq files related to a given article to show my students how to process this kind of data.

ADD REPLY • link 14.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks for your answers, but I'm looking for an article (published results) in which the raw data are available.

ADD REPLY • link 14.1 years ago by Orca ▴ 140

0

Entering edit mode

I would like to perform some analysis of Ins/Del, SNP on human genome.

ADD REPLY • link 14.1 years ago by Orca ▴ 140

0

Entering edit mode

I have a question. Since it is very related I chose to post it here.

Q: So when you say that you want to validate your analysis pipeline with published/publicaly available data. Do you get any rights to publish your results based on your analysis of somebody elses's data. What are the norms to use publicaly available NGS data?

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 10.6 years ago by rohan ▴ 110

score 6 · Answer 1 · 2010-11-08

6

Entering edit mode

14.1 years ago

biobot 0.0.77.a.1099 6.2k

From the NCBI Sequence Read Archive. To obtain Fastq format see the relevant section in the SRA handbook for which you will probably need the SRA Toolkit.

ADD COMMENT • link 14.1 years ago by biobot 0.0.77.a.1099 6.2k

score 6 · Answer 2 · 2010-11-08

6

Entering edit mode

14.1 years ago

User 59 13k

You could also look into the European Nucleotide Archive at the EBI.

ADD COMMENT • link 14.1 years ago by User 59 13k

score 5 · Answer 3 · 2010-11-09

5

Entering edit mode

14.1 years ago

Sean Davis 27k

NOTE: Shameless plug for our software....

You could have a look at the SRAdb R/Bioconductor package. We pull down all the metadata from the sequence read archives at EBI, NCBI, and DDBJ and consolidate that into a SQLite file that can be used from R or any other language that has a SQLite interface.

ADD COMMENT • link 14.1 years ago by Sean Davis 27k

0

Entering edit mode

A great idea ! I will use it as soon as possible. Thanks

ADD REPLY • link 14.1 years ago by Puthier ▴ 250

0

Entering edit mode

Is there a SQLite interface for perl ?

ADD REPLY • link 10.6 years ago by rohan ▴ 110

score 5 · Answer 4 · 2011-06-16

If you are looking for non-human sequences you can use the European Nucleotide Archive at EBI. But if the papers that you are looking at, are from humans, then you need to go to the European Genotype Phenotype Archive at EBI EGA or the datatabase of Genotypes and Phenotypes at NCBI dbGaP. Be aware that despite of being public, almost all the human data from research studies are under consent agreement rules, so you need to ask for access first before able to access or use the data. Unless you want to replicate a study, it would be easy to use data from 1000 genomes or similar projects that you can download directly from the 1kg web site.

As a side note: for downloading this BIG data sets is better to use aspera than ftp when this possibility is provided (see 1kg data access)