Comprehensive Intro To Next Generation Sequencing
3
7
Entering edit mode
14.2 years ago
bow ▴ 790

Back when Sanger sequencing was the most popular method, analyzing the sequence data was a simple task and only required relatively little background. Now, with the rapid and overwhelming development of next-generation sequencing, even learning about it seems like a daunting task. As a person still very much unfamiliar with the field, I feel lost in the amount of information that I need to digest: the assembly and alignment methods, the amount of available apps, the pipeline, the different platforms, etc -- I'm confused where I should start!

So my question would be: What is the most comprehensive resource to learn about next-generation sequencing data analysis? Preferably using SOLiD and/or Illumina (going from the most raw data to the final sequence).

I realize the answer might be very long-winded, but what I'm looking for is at least pointers so I can figure out what I need to know next on my own. I very much want to be able to analyze and interpret the data flood that is coming out of the field, but I'm pretty clueless right now. The things I know I learned from various, separate sources and sometimes it's hard to tie them together. So I this information would be a huge help for me (and I'm sure for other initiates alike :) ).

A little bit background: I'm familiar with UNIX (Linux), know a bit of Python and Java (if that helps). Currently I am doing a research project that involves building and assembling SOLiD and Illumina RNA-seq data to a reference genome sequenced using 454.

next-gen sequencing solid illumina • 5.8k views
ADD COMMENT
10
Entering edit mode
14.2 years ago
User 59 13k

I find an RSS feed to the bioinformatics section of SeqAnswers to be pretty much invaluable. And a subscription to the mailing lists for all the tools I use - MAQ, bowtie, bwa, etc. etc as well as an eye on bioc-sig-sequencing for R/BioConductor related stuff.

The thing I have found recently in the field, is that there is no such things as best-practice established, a tools ecosystem that has not yet undergone enough selection to produce clear winners for most NGS tasks and there is a paucity of documentation. I think this is to be expected in a rapidly moving field, so there is currently no substitute for 1) experience and 2) an eye on what everyone else is doing.

Most of my recent 'oh really?' moments have come chatting with other bioinformaticians doing similar analysis with similar aims (in my case variant detection from exome capture).

Just my $0.02

EDIT: There's also a recent set of summary papers in Briefings in Bioinformatics

ADD COMMENT
0
Entering edit mode

SeqAnswers seems golden, thanks! Briefings in Bioinformatics looks great, too!

ADD REPLY
6
Entering edit mode
14.2 years ago
Ian 6.1k

It might be useful for you to check out the GALAXY 'NGS TOOLBOX' at http://main.g2.bx.psu.edu/. It contains a selection of analysis tools for different aspects of NGS, e.g. variant analysis, ChIP-seq. There is also a section for QC (quality control) of reads, which is an essential step. This is only a small selection of what is available, but it will give you a feel for what can be done. You may also find some of the GALAXY "Quickie" movies of use.

You may also want to familiarise yourself with the SAM/BAM format, which makes data analysis and storage easier http://samtools.sourceforge.net/.

In terms of viewing data I like to upload my data in UCSC browser http://genome.ucsc.edu/, but other stand alone browsers such as IGV http://www.broadinstitute.org/igv/ or Savant http://compbio.cs.toronto.edu/savant/index.html have their own strengths.

Have fun!

ADD COMMENT
1
Entering edit mode

I have got more use out of IGV and UCSC than I have Galaxy. Does anyone enjoy using Galaxy? I see its point, but dislike using it when there is a Unix shell available..

ADD REPLY
0
Entering edit mode

I really do like Galaxy for downstream (post mapping) analyses where the comparison of genome coordinates is involved. I can easily see all the comparisons i have done. I think it is a very easy way to access coordinate based data sets, e.g. UCSC conservation info, alignments, etc.

ADD REPLY
0
Entering edit mode

Not really used it for that I must admit, I'll have to take a look at that functionality

ADD REPLY
4
Entering edit mode
14.2 years ago

Also have a look at the Bioinformatics NGS virtual issue. It keeps track of a range of latest tools.

ADD COMMENT
0
Entering edit mode

NAR? Bioinformatics surely..

ADD REPLY
0
Entering edit mode

oops; well spotted..

ADD REPLY
0
Entering edit mode

This is completely off-topic but Alastair, I believe you and I were lab partners back in Edinburgh, 1989-93. Hello! Good to see you also found your way to bioinformatics.

ADD REPLY
0
Entering edit mode

Hi Neil! I've been in the field since my PhD back in 1993: I never did like bench science :>

Can you send personal messages via this web site?

ADD REPLY

Login before adding your answer.

Traffic: 1533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6