Question

Paper Or Detailed Tutorial For Dna Variant Calling Pipeline? Need Help To Start

2

Entering edit mode

11.9 years ago

newDNASeqer ▴ 790

I am a newbie to high-throughput DNA sequencing analysis, and have just started doing my postdoc in this area. I used to do wet bio, but have great deal of experience using Linux and writing code in Java and Python. It seems to me the learning curve is pretty steep in learning DNA variant calling.

Since I started working in this new lab, I have followed a Nature protocol paper to run RNA-Seq pipeline: Tophat - Cufflinks - CuffMerge - CuffDiff. I think the process is not hard, just lots of waiting time on the computer.

I am not sure where I should start for DNA variant calling. Can anyone give me some guide to a paper or an online step-by-step protocol? I appreciate your reply.

dna variant calling pipeline • 5.6k views

ADD COMMENT • link updated 11.8 years ago by rob234king ▴ 610 • written 11.9 years ago by newDNASeqer ▴ 790

score 5 · Answer 1 · 2013-07-08

5

Entering edit mode

11.9 years ago

Ashutosh Pandey 12k

Some posts:

What is the best pipeline for human whole exome sequencing?

whole genome analysis pipeline (Illumina)

workflow or tutorial for SNP calling?

This paper also uses some of the current practices in NGS.

http://genomebiology.com/content/13/8/R72

ADD COMMENT • link 11.9 years ago by Ashutosh Pandey 12k

0

Entering edit mode

I would add as well that some of your choices will also depend on whether you are doing whole genome sequencing or target-enrichment like Exome sequencing. I think there is a general problem in genomics studies right now of people not publishing their full pipelines for the analyses they did in enough detail but if you look through papers in the area you are working on, especially ones from the last 2-3 years, you should get an idea of what tools people are using and some paramater settings. Most people stick with fairly default paramaters and my personal feeling is that BWA + GATK is probably the most widely used protocol in general.

That said there are some papers showing the non-overlap of variants called with different pipelines run on the exact same data. One from this year was published in Genome Medicine and is worth reading: http://genomemedicine.com/content/5/3/28

That publication will give you an idea of a few different pipelines. You may also want to check out GCAT: http://www.bioplanet.com/gcat/ which has test datasets you can use to test your pipeline choices against other pipelines on the same data. Also lets you compare any four (at a time) of various pipeline setups on the same datasets.

ADD REPLY • link 11.9 years ago by DG 7.3k

score 0 · Answer 2 · 2013-07-23

0

Entering edit mode

11.8 years ago

rob234king ▴ 610

I’ve put together a tutorial website with four core comprehensive tutorials on it, RNA-Seq, ChIP-Seq, Genome assembly, and SNP calling.

http://elvis.ccc.cranfield.ac.uk/CUBELP2/

ADD COMMENT • link 11.7 years ago by rob234king ▴ 610