As a computationalist newby to this field of science and not a bio-informaticist I have a question which relates to configuration of IT infrastructure in a start-up genomic enterprise. If I need to receive an entire human genome (either in one blob or in many bloblets!) from someone what are the best methods, formats for transmission and how big would it be? IT may be currently impractical to send them around attached to an email right now ;-} but what if I needed to design an IT infrastructure for the future where it was possible to archive and transmit whole human genomes?
As Jarretinha points out the data from the sequencer could be much bigger than the assembled genome. But how much bigger for a NGS machine? Any experiences?
I think you need to think a little bit more about your question. Nowadays people at the drybench receive numerous huge files with genomes in redundant pieces. Do you want a raw, high quality genome sequence or a real example of a NGS machine output?
You should also check:
Justo to mention, NGS human data for personal genomics is only useful if you're able to reconstruct haplotypes for all cromosome pairs. Remember that we're diploids and dominance, mendelism counts a lot. 3e9 is the haploid size without heterochromatin!!! Our true genome is much bigger! Most seq files in databases are just a consensus. So, you really must specify your needs & aims in order to get a useful answer.