What Are The 'Copy Number Detection' Tools Out There For Exome Capture Ngs Data.
8
27
Entering edit mode
13.1 years ago
Prateek ★ 1.0k

Do you know of any CNV detection tools for NGS paired-end exome data - coverage method (window based) or paired-end mapping method (clustering based)? I am aware its a tough problem to solve and have looked at some tools for whole genome but couldn't find one for exome.

I would also welcome discussion about how existing tool could be re-purposed for exome through post-processing (like ignoring exon boundaries).

Finally, please feel free to point out tools for structural variants (inversions, translocations etc.) too.

cnv copynumber next-gen sequencing variant structural • 24k views
ADD COMMENT
11
Entering edit mode
13.0 years ago
Ryan D ★ 3.4k

Take a look at the supplementary information from the 1000G paper located here.

They use something like 15-17 algorithms including read-pair analysis (RP), read depth analysis (RD), split read analysis (SR), and sequences assembly (AS).

Those are broken down in Tables 2A and 2B of the supplement

In brief:

Read depth: Event-wise testing, CNVnator

Read pair: Spanner, PEMer, BreakDancer

Split read: Mosaik, Pindel

PD read pair/read depth: Spanner, Genome STRIP

There is also a 1000 Genomes tutorial on structural variants by Jan Korbel:

Video:

Slides: http://www.genome.gov/Pages/Research/DER/1000GenomesProjectTutorials/StructuralVariants-JanKorbel.pdf

A bit dated, but it can get you started.

ADD COMMENT
11
Entering edit mode
12.3 years ago
SBinson ▴ 110

cn.MOPS works well for this task.

ADD COMMENT
0
Entering edit mode

Another vote for cn.mops. I wish I had known about it when I supplied my original answer.

ADD REPLY
0
Entering edit mode

And another vote for cn.mops, big bonus point for me that it even works with small dataset (5-7 samples).

ADD REPLY
0
Entering edit mode

cn.mops performed very well for detecting CNVs in free circulating cancer DNA.

ADD REPLY
0
Entering edit mode

cn.mops works very well for analyzing exom sequencing data from cancer genomes

ADD REPLY
0
Entering edit mode

I've had great luck with CN.mops and it's relatively easy to use, even for an R newbie. Also Günter (the software's author) is very helpful and responsive!

ADD REPLY
10
Entering edit mode
ADD COMMENT
8
Entering edit mode
13.1 years ago
Vitis ★ 2.6k

There are several strategies to find structural variants (SVs) with genomic or exome NGS data. First, using paired-end data, you can mine the distribution of insert sizes between read pairs and infer SVs by identifying unusual insert sizes. Second, you may scan through the genome/exome to find regions with unusually high and low coverage. This is the only approach with which you can estimate the copy number (don't how accurate that is). Then you can also use the reads that get split when mapping, which may fall into SV regions. Finally, de novo assembly followed by traditional comparative genomics approaches can also help with SV discovery. Of course, you can combine all these approaches together and find the candidates with highest confidence.

I heard CNVnator is a pretty good coverage-based tool for genomic data, but not sure whether it's gonna perform well with the exome data. Considering the size and distribution of exons, split read method seems to be attractive. My personal experience involves a genomic data set, we assembled the genomic reads de novo, and used traditional method like MUMmer to identify the SVs and verified by coverage-based approaches. It works quite well but I don't know how de novo assembly would perform for exome (I heard the Trinity pipeline is rising as a good tool for de novo assembly of transcriptome or exome).

There is nice review on Nature Reviews Genetics. It said everything I mentioned and much more. http://www.nature.com/nrg/journal/v12/n5/full/nrg2958.html

ADD COMMENT
2
Entering edit mode

CNV-calling programs designed for whole-genomes will almost certainly not work on Exomes - the data is sparse, the depths are variable due to capture affinities, etc.

ADD REPLY
1
Entering edit mode

Split read methods will be extremely limiting with exome data. For it to work, it assumes that the breakpoint is within the exon or sighing the roughly 50 to 100 bp of "splash" on either side of the exon. It is far more likely that the breakpoint is rather far away from the exon, yet the event affects the exon as well. Normalizing and comparing depth of coverage among multiple samples is your best bet.

ADD REPLY
5
Entering edit mode
13.1 years ago
User 59 13k

Another vote for ExomeCNV. There's also CNASeg and CNV-Seq (although I'm not sure of their appropriateness for exome data). I've also seen CNVnator mentioned on SeqAnswers in relation to this question, but I think that Chris's point about variable depth means this is certainly a trickier proposition than for WGS.

EDIT:

I've also just seen an abstract for another BioConductor package based on an HMM approach. exomeCopy is the package.

ADD COMMENT
3
Entering edit mode
10.0 years ago
Eric T. ★ 2.8k

Are you looking for CNVs in a population, or disease-causing copy number alterations in individual tumor or constitutional samples?

For the former, most of the answers already posted here, including cn.MOPS, will do.

For the latter, particularly tumor samples, CNVkit is a program I wrote recently that performs well.

There are lots of these tools tailored for slightly different purposes, and it's a good idea to look for recent papers that independently benchmark several of them at once.

ADD COMMENT
2
Entering edit mode
11.6 years ago
fromer ▴ 20

We've written the XHMM software for calling CNV from exomes: http://atgu.mgh.harvard.edu/xhmm/

Our paper describing this was published last year in AJHG: http://www.cell.com/AJHG/abstract/S0002-9297%2812%2900417-X

ADD COMMENT
1
Entering edit mode

Can you say how many BAM files would be required to have reliable calling?

ADD REPLY
0
Entering edit mode
10.0 years ago
chongchu.cs ▴ 10

We have a tool for calling genotypes of insertions and deletions for WGS. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0113324

ADD COMMENT
0
Entering edit mode

I'll be interesting on testing your algorithm but will it work also for exome. Otherwise, I'll look at the source code to know how it works.

By the way, I'm analysing wildtype/tumor samples

ADD REPLY

Login before adding your answer.

Traffic: 2143 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6