I am a newbie to high-throughput DNA sequencing analysis, and have just started doing my postdoc in this area. I used to do wet bio, but have great deal of experience using Linux and writing code in Java and Python. It seems to me the learning curve is pretty steep in learning DNA variant calling.
Since I started working in this new lab, I have followed a Nature protocol paper to run RNA-Seq pipeline: Tophat - Cufflinks - CuffMerge - CuffDiff. I think the process is not hard, just lots of waiting time on the computer.
I am not sure where I should start for DNA variant calling. Can anyone give me some guide to a paper or an online step-by-step protocol? I appreciate your reply.
I would add as well that some of your choices will also depend on whether you are doing whole genome sequencing or target-enrichment like Exome sequencing. I think there is a general problem in genomics studies right now of people not publishing their full pipelines for the analyses they did in enough detail but if you look through papers in the area you are working on, especially ones from the last 2-3 years, you should get an idea of what tools people are using and some paramater settings. Most people stick with fairly default paramaters and my personal feeling is that BWA + GATK is probably the most widely used protocol in general.
That said there are some papers showing the non-overlap of variants called with different pipelines run on the exact same data. One from this year was published in Genome Medicine and is worth reading: http://genomemedicine.com/content/5/3/28
That publication will give you an idea of a few different pipelines. You may also want to check out GCAT: http://www.bioplanet.com/gcat/ which has test datasets you can use to test your pipeline choices against other pipelines on the same data. Also lets you compare any four (at a time) of various pipeline setups on the same datasets.