I am a software developer who is investigating software optimization techniques. I got maximal marks in high-school biology and moved onto period 3 elements.
I was told that CASAVA produces enormous output files, and I am interesting in procuring some kind of file that will produce an enormous output file. I want terabytes or a few hundred gigabytes, at the least. The validation examples don't do this.
Are there some publicly available files that can do this? Are there some genomes online?
==Edits==
- I want to profile certain Disk I/O performance characteristics when writing large files. I have a harddrive with lots of wires and a some fancy equipment.
- I understand that Casava takes in BCL files and spits out FASTQ. I want it to spit out a large FASTQ file. SO I guess I need the BCL files? Is there a public repository?
What is the input that your algorithm needs? Are you looking for FASTQ files, or something else?
Can you fill us in on what you are trying to accomplish?
If you are only doing read/write benchmarks generate the data on the fly with a small program and write it out. Thats easier than download in correct sizes or trim it to the correct sizes.