I was wrestling with this question myself some time ago, but I was just spurred to re-visit it by David Ewing Duncan, Experimental Man. He's just received his data from Complete Genomics, and now he asks for suggestions on what to do:
This is an appeal: Send me you ideas for how best to interpret my newly sequenced complete genome!
I think this group is better positioned than most to have this discussion. Some time ago I answered this for myself, and I'll summarize that here and add a couple of things. But I'm also wondering what you guys would do with your data. And I'll point David to this thread in the comments at his blog.
1) Assessment and QC: checking the files, formats, etc, look at sequences compared to reference genome and well-known genes, and to other available sequences in GenBank. Now I'd also add that I would cross-check my 23andMe data to make sure they said the same thing. And if not I'd investigate why.
2) Build myself a personal browser. Probably a UCSC Browser because that's what I know best, but I'd consider others. [well, ok, I'd hire one of you guys to do that, technically]. There's a bunch of things I'd want to consider to add as tracks and annotations, but I haven't thought them all through yet. I'd also plan to create custom tracks on my own reading/research that I want to link to regions/genes/variations. Like a literature track, I think. This would be how I'd store stuff I want to access later, and accumulate over time. Personal curation, I guess.
3) Look closer at well-characterized and medically-relevant genes probably based on the NHGRI catalog, GeneTests, etc. I know this is looking under the flashlight, but it would still be the part I'm most curious about--and the best chance for new understandings and actionable stuff. I'd add that I'd also look specifically for CNVs in my data too.
Your turn. What would you do?
EDIT: the question software wants me to offer a bounty now. Seriously--no one has thought about a workflow for this kind of data? I found this discussion fascinating, but I was really looking for the outlines of a process. If I can figure out the bounty thing maybe I'll add one....
EDIT 2: I started a bounty. Although the software told me it was going to be a 50, it says 100--whatever. I'm looking for a workflow--a series of steps you would perform to explore one person's whole genome data if it was given to you. It doesn't have to be yours. And it doesn't have to be written in Perl. An outline is fine.
Try to avoid freaking out when you find a bunch of markers associated with diseases?
Heh. Step 4: obtain counselling.
Should this question be community wiki?
I don't know. I thought it was a real bioinformatics process question, but as there may not be a single answer perhaps. I'll switch it.