Question

Analysing Multiple Exome Datasets

4

Entering edit mode

14.2 years ago

User 9126 ▴ 50

Dear All, We have whole exome sequencing data of ten samples from illumina. While I was looking for a pipeline to analyse this data i found a nice thread http://biostar.stackexchange.com/questions/1269/what-is-the-best-pipeline-for-human-whole-exome-sequencing

But, I am little bit confused in using this pipeline for all 10 samples together.

Should I first align each file to the genome independently,combine as single bam file and proceed further.?

or should I need to process every file independently through all these steps? If so what should I do finally to understand the result out of all 10 samples?

Thanks

Santhosh

exome sequencing • 3.1k views

ADD COMMENT • link updated 14.2 years ago by Sean Davis 27k • written 14.2 years ago by User 9126 ▴ 50

score 5 · Answer 1 · 2011-06-20

There are (at least) two places in processing and analyzing exome data that benefit from borrowing information from other samples. The first is when aligning around indels since one may borrow information form reads in all samples when searching for support for indels. The second is when calling variants; several variant callers allow one to specify multiple BAMs when calling variants to capitalize on all information when calling variants. That said, in our group, we process all samples pretty much independently and only combine for variant calling where having one large BAM file is not necessary.

At the end of the day, though, it pays to really think about your study and study design when deciding how to proceed with an analysis. While most people talk about a "pipeline" for exome sequencing, it is easy to define situations (related individuals, for example) where a "general pipeline" is not optimal.