Hi there,
I want to generate NGS data to do some test and benchmark in both germline and somatic variant calling. I've read a lot of papers about different tools and different tools benchmarks but I want to know your feedback. After reading the papers, I have chosen two tools: VarSim
and BAMSurgeon
.
BAMSurgeon
uses pre-existing BAM files and adds new variants to them. It's has been widely used in DREAM challenge for testing variant calling algorithms so I assume that it works really nice. Using pre-existing BAM files, the advantage is that you can real data and then introduce new variants for the benchmarking.- For other hand,
VarSim
is able to generate read files taking as input a reference genome and a set of variants. All the data here is purely simulated (well, the variants can be random or previously described ones), and the advantage is that you can somehow control different types of error (like sequencing errors and so on). And also, havingfastq
files it is possible to test a full pipeline of Alignment+Variant_calling workflow.
At the end, What I would like to have is set of tumor/normal pair fastq
files, with a true.vcf
dataset, and then be able to play and adjust different parameters like: _clonality, heterogeneity, contamination, sequencing error.._
Sorry if the question is too open or wide. I'd like to receive suggestions and personal experiences about the best way to generate this kind of data. If its specific por Exome/Target sequencing would be even better.
Thank you in advance,
Do you mean VarSim+Art when you say that you used Art?
Just ART from FASTA files. I created script to generate the FASTA files since VarSim only supports simple ins/del/dup/inv SVs.
Entire classes of somatic mutations (gene fusion, chromoplexy/chromothripsis/breakage-fusion-bridge, double minutes, ...) were missing from the simulators the last time I checked. By far the biggest issue I had with somatic simulations was the lack of aneuploidy and inter-chromosomal rearrangements. The majority of the cancers I've analysed were most definitely not simple diploid genomes with some SNVs and simple local rearrangements thrown in. 50+ copies of an unmutated oncogene is not unexpected for cancers showing signs of chromothripis/breakage-fusion-bridge.
I'm wondering the http://shiny.wehi.edu.au/cameron.d/sv_benchmark/ is still available? I'm not able to see the results.
Unfortunately not. We do have a benchmarking paper with more comprehensive results coming out soon.