Hi everyone,
I'm looking for NGS/Arrays sample datasets for teaching purposes. Most people I know that have this kind of data are somewhat jealous about them. I'm interested in the whole data pipeline. So, raw data is very much wanted too. Don't need to be a huge dataset, just a very illustrative one. If you know where can I find any, please, let me know too !!!
-- Edit --
Just to clarify: I don't need a complete dataset from each pipeline step. A small real sample is enough. The real part is crucial. Simulations are welcome ,too.
One thing to note is that many of the raw data processing tools are not easy to install and require a full directory structure to operate correctly. Here is a page with information and official guides for the Illumina pipeline that I prepared for our group.
How "raw" do you want it? Take the Illumina system. Data processing starts with images. Image analysis transforms those into to intensities; intensities are transformed into basecalls; basecalls are then mapped to the genome.
Would anyone provide big image data? The most raw I have seen in archives was fastq so far.
I was only asking this because of Jarretinha mentioned that he wants to learn about the "whole pipeline". The size of the images from a single run total around 2 terrabytes - for that the fastest transfer bandwidth is shipping hard disks.
Yeah, when I say raw it's as raw as you can. Raw as in fresh meat. I do know that a whole run on any NGS machine is too much data for any connection. But, a dozen images are more than enough to illustrate image processing concepts on real NGS data. And it need to be real cause my target audience is mainly composed non computer/biology geeks. They really must feel the complexity of the task. By the way, surface mail is allowed in answers :)
No snail mail necessary ;) just googled it out.