Question

Somatic CNVs insilico evaluation

0

Entering edit mode

17 months ago

s.awasthy ▴ 30

Hello all, I am working on somatic copy number identification from wes data. As the data is paired and from tumor matched normal so i have implemented three tools ( control freec , cnvkit, varscan2) to identify them and run gistic2 on these tools outputs in order to get the recurrent scnvs. I got the segmentation files from the these tools. I want to know, how can i compare these tools output and what filters should i apply to validate my results. I am working on gallbladder cancer data and maximum previous studies have controlled access. I am having some options to proceed please suggest which approach will be suitable.

I should download the available segmentation files from various repositories(TCGA data,cbioportal) and compare them as for gallbladder not much data is there so should i need to choose the similar cancer files?
With reference to this paper, can proceed in a way of generating simulated data and go for sensivity and specificity check?[1]: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1705-x.

Thanks

cnv wes • 1.5k views

ADD COMMENT • link 16 months ago by s.awasthy ▴ 30

score 0 · Answer 1 · 2023-07-08

0

Entering edit mode

17 months ago

Prash ▴ 280

Hi Awasthy Could you try SnpEff to find the variants? That should be the first step and then varscan filter or compare should work

Best Prash

ADD COMMENT • link 17 months ago by Prash ▴ 280

0

Entering edit mode

Hi Sir, I have already implemented SnpEff in combination with annovar for variant annotation.

ADD REPLY • link 17 months ago by s.awasthy ▴ 30

score 0 · Answer 2 · 2023-07-11

0

Entering edit mode

17 months ago

Prash ▴ 280

Ok. Was TNM grading checked? Jn thag case you can consider such similar cancer files but not before you have statistical significance.

However, in any case you cannot consider this for precision or individual risk assessment, IMHO

Prash

ADD COMMENT • link 17 months ago by Prash ▴ 280

0

Entering edit mode

Hi Sir, Thanks for the suggestion.As i am working with bam files so the approach i have used is mentioned below.

Using control-freec . I have generated segmentation for the all the required number of samples(30).My output is like this: i have just posted the first few lines

Sample_1 1 65510 1722599 218225 -0.436195608872496 Sample_1 1 1725102 1751321 7717 0.121731595872128 Sample_1 1 1751515 3625679 147483 -0.404929881500916 Sample_1 1 3627508 16756006 763405 -0.267911447214069 Sample_1 1 16756331 16946710 7146 -0.866328637821233 Sample_1 1 16965751 21473402 267655 -0.291576124002509 Sample_1 1 21474941 21484434 2042 0.228517039518985 Sample_1 1 21553947 23194281 129185 -0.305160804831592 Sample_1 1 23309741 25976613 189104 -0.205312152785954

Using cnvkit:

Sample_1 chr1 14621 1334287 308 -0.0413884 Sample_1 chr1 1334288 1614850 182 0.00766328 Sample_1 chr1 1615351 1633402 16 -0.261991 Sample_1 chr1 1633414 3499330 538 -0.00117815 Sample_1 chr1 3499623 3501383 5 -0.0824541 Sample_1 chr1 3501785 16683511 2762 0.0277874 Sample_1 chr1 16683553 16954947 72 -0.515937 Sample_1 chr1 16955391 21423811 1032 0.014945 Sample_1 chr1 21424037 21429875 12 -0.227662 Sample_1 chr1 21429878 21472956 18 0.00216148 Sample_1 chr1 21473402 25343123 1028 0.0382977

I would like to find the common SCNAs from these tools and use that file in gistic2 for recurrent alterations and also want to filter the Germline cnvs.

Approach used(for filtering Germline CNV) - i have identified significant calls using freec significant script(include in control freec tools) and then filter the calls on the basis of p-value (0.001) and intersect these calls with dgv database(hg38, 80% reciprocal overlap) but i have got a dgv match for almost every call .So how can i identify whether the its somatic or germline.

Please suggest how can i compare these tools segmentation files and filter the germline calls.

ADD REPLY • link 16 months ago by s.awasthy ▴ 30

score 0 · Answer 3 · 2023-08-11

0

Entering edit mode

16 months ago

Prash ▴ 280

Hi S Awasthy, what you could do is employ the filters using awk liners awk 'if($2==$2) && if($1==$1){print $3,"CONSENSUS HITS} else {print $3,"MISMATCHES"}}' control-free.out cnvkit.out > consensus.out

And then set the best hits with a cutoff of segmentation values using awk '$6<=0' consensus.out //pl ensure the cutoff for segmentation is your prerogative for germline/somatic calls