Question

Compare mutation frequencies in TCGA and cbioportal

0

Entering edit mode

3.9 years ago

P ▴ 10

I am looking to compare mutation frequencies between two databases, for example TCGA and cbioportal for the same set of genes.

I understand that Fishers exact maybe the most suitable format. But, I am not able to wrap my head around the actual input data needed for this.

Sample information I have:

Gene    cbioportal_n    cbioportal_mutation_Freq_per_model  TCGA_n  TCGA_mutation_Freq
Gene1   91                     76.0%                                                    128          72.7%

Also, If there is a way to automate this process in a script with R/ python options.

Thanks!

TCGA frequency mutation Fisher's exact • 748 views

ADD COMMENT • link updated 3.9 years ago by Ram 44k • written 3.9 years ago by P ▴ 10

0

Entering edit mode

What is the purpose of this exercise? As far as I understand, cBioPortal is not a data source, it's a place to look at data collected elsewhere. Strictly speaking, TCGA is not a raw data source either but it is the closest you can get to one. cBioPortal operates on various TCGA datasets, so this comparison is comparng among subsets of the same dataset.

ADD REPLY • link 3.9 years ago by Ram 44k

0

Entering edit mode

I should perhaps edit the title to "Compare frequency of gene mutations in two different databases (internal db vs TCGA)", I simply took TCGA and cbioportal as as example. The idea is to see if there are any significant differences in the occurrence of gene mutations in the two different database. Thanks!

ADD REPLY • link 3.9 years ago by P ▴ 10

0

Entering edit mode

Ah, I see. That makes a lot more sense.

I don't see how that can be done in a statistically meaningful manner though - after all, per gene, you'd have just two numbers - freq_in_internal_db and freq_in_external_db. How would you do Fisher's test here? Are you testing if genes on average have a higher mutation rate in one cohort vs the other?

ADD REPLY • link 3.9 years ago by Ram 44k