I am currently trying to analyse the presence of CNVs in a targeted sequencing dataset generated from Ion Torrent platform. The data description is as follows.
- Targeted region is a segment of the genome that spans multiple genes (not limited to exons).
- Single end reads of variable length.
- Reads are mapped to the hg19 assembly.
- The sample are not paired. We have a set of test and control samples.
So far, I have tried analysis using various tools such as CNVkit, CNVSeq, freec, codex. None of them seem to work well with the kind of data that I have in hand.
Can anyone suggest an analysis strategy that I could try with such data?
Thanks.
Also, I would like to know if I am better off using the whole genome approach of analysis by mapping the reads to the segment of the genome in question?
You could try to calculate the coverage per gene/exon/target, median normalize over all/within sets, build a reference sample supposed to be copy number of two, calculate the ratio between target and reference sample and perform a cluster analysis for CNV detection per gene/exon/target. This is similar to the tools freely available but I obtain more reliable results using a individualized pipeline for targted NGS results.
Thanks for the suggestion. Can you suggest tools that we could use to perform the above steps?
I recommend R for this. Additional helpful packages are cn.mops, Gviz or Exomecopy and some more. You can find a comprehensive list in this nice paper: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S11-S1