Best resources for understanding GATK best practices pipeline (or similar)
1
0
Entering edit mode
5.6 years ago
Tails ▴ 80

I've been trying to find a resource for understanding the various steps in the GATK pipeline or more generally, mapping and alignment, and variant calling in general. There seems to be a huge learning curve involved and every resource I've come across assumes you've worked with the data for quite a while or you're super smart and able to figure out what's unsaid.

Even a concept as simple as how the forward and reverse reads are generated, and what strand bias is exactly, is buried under mountains of technical documents.

I've checked out the broadE videos, and they are quite useful. But is anyone aware of any other good introductory resources that give a broad overview of "how we do genomics", and possibly highlighting the difficulties we might encounter with determining what "truth" is, and the hurdles we may encounter? A book or real-life examples would be nice.

Assembly sequence alignment genome snp • 1.0k views
ADD COMMENT
0
Entering edit mode
5.6 years ago

Generally, if you want to get into genomics, you should have a linux 'shell' (BASH, SH, CSH, ZSH, etc) (all major operating systems now support shells) and a machine that has at least 8GB RAM (for full genome alignment). You should know how to download and install programs like SAMtools, BWA, BCFtools, etc.

-------------------------------

For learning a bit more about sequencing, you should know that the most widespread method is SBS (sequencing by synthesis), a technology that was purchased from SOLEXA by Illumina many years ago (Illumina 'never' invents anything on its own). Here is a video that goes over the SBS process:

-------------------------------------

But is anyone aware of any other good introductory resources that give a broad overview of "how we do genomics"

We do genomics by going into work and reading emails, prepping samples in the lab, processing data, attending meetings, etc.

----------------------------------------

and possibly highlighting the difficulties we might encounter with determining what "truth" is, and the hurdles we may encounter?

The 'truth' about which you speak is hidden in the cells, tissues, etc. that we study. We are limited to how precisely we can measure this truth by the very instruments that we use, which each have certain detection sensitivities and associated error. The SBS method (mentioned earlier), for example, is fraught with error, and for good reason (from Illumina's perspective) the level of error is buried in documents not available to the public.

----------------------------------------

For GATK, specifically, start here: Introduction to the GATK Best Practices

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6