Question

Advice on how to generate synthetic Copy Number Variation data

0

Entering edit mode

2.8 years ago

K.patel5 ▴ 150

Dear Biostars,

I have developed a CNV detection pipeline for my WES data. While I think it is OK, I would like to test its performance against some synthetic WES data which contains CNVs which I have created synthetically. Does anyone have any experience in generating synthetic WES data based on a FASTQ/ bed file? Or would it be better to spike in duplications or deletions into already existing fastq files which could be used as controls? Any advice on tools which can perform this would be really appeciated.

CNV genomics biology synthetic WES • 1.5k views

ADD COMMENT • link updated 2.8 years ago by Prash ▴ 290 • written 2.8 years ago by K.patel5 ▴ 150

score 0 · Answer 1 · 2022-10-10

0

Entering edit mode

2.8 years ago

Prash ▴ 290

Dear K.patel5 This is excellent! Many exome pipelines predict putative/probable CNVs which may not be bona fide. Realistically, they are synthetic, void validation. You could consider them as positives to infer whether or not your pipeline performs well.

If there are less ( and perhaps accurate )CNVs from your exome data, and then you invariably check this from WGS, that should be fine. NOT sure, if this approach sounds good, but to me, it should be okay

Our exomes have several such CNVs mapped too!

Regards Prash

ADD COMMENT • link 2.8 years ago by Prash ▴ 290

0

Entering edit mode

Thank you for your communication @Prash. Yes I think this kind of ratification would greatly benefit our analysis but we lack any WGS data and only have WES samples. Do you have any tools you could recommend for use to create synthetic samples/ spike-in controls?

ADD REPLY • link 2.8 years ago by K.patel5 ▴ 150

1

Entering edit mode

Pleasure. During early 2010, SLOPE was a wonderful tool, but the SVs called then were of not that greater precision: https://academic.oup.com/bioinformatics/article/26/21/2684/214667

ADD REPLY • link 2.8 years ago by Prash ▴ 290

0

Entering edit mode

Thanks @Prash, I can see here they demonstrate their detection tool by generating synthetic data - I can follow this as a blueprint. I suppose there aren't many tools that can create deletions/ duplications and I will have to do this manually.

ADD REPLY • link 2.8 years ago by K.patel5 ▴ 150

score 0 · Answer 2 · 2022-10-10

0

Entering edit mode

2.8 years ago

Prash ▴ 290

Yes, the best would be to employ deepvariant.

ADD COMMENT • link 2.8 years ago by Prash ▴ 290