Hi everyone,
I want to know the frequency of CNV events for all genes in breast cancer using COSMIC data. For this purpose, COSMIC generates a nice tabular format based on given filters like type of cancer etc (see http://cancer-beta.sanger.ac.uk/cosmic/browse/tissue#sn=breast&ss=all&hn=all&sh=all&in=t&src=tissue&all_data=n&wgs=off). The table can be viewed in Variants -> CNV & Expression tab in "Cosmic » Cancer Browser » Breast" section.
I realized that sometimes the above table contains multiple entries for the same gene. For instance, for ERBB2, it shows 7 entries, as given in the this link https://ibb.co/doK5iQ
My question is which entry one can use for computing the frequency of gains and losses in this case?
Thanks
Just in case someone else is wondering about this issue as well, I am currently selecting the row with the maximum number of samples in "Tested" column. This is basically based on the hypothesis that that row would 'most probably' contain the samples from other categories.