Compare mutation frequencies in TCGA and cbioportal
0
0
Entering edit mode
3.9 years ago
P ▴ 10

I am looking to compare mutation frequencies between two databases, for example TCGA and cbioportal for the same set of genes.

I understand that Fishers exact maybe the most suitable format. But, I am not able to wrap my head around the actual input data needed for this.

Sample information I have:

Gene    cbioportal_n    cbioportal_mutation_Freq_per_model  TCGA_n  TCGA_mutation_Freq
Gene1   91                     76.0%                                                    128          72.7%

Also, If there is a way to automate this process in a script with R/ python options.

Thanks!

TCGA frequency mutation Fisher's exact • 748 views
ADD COMMENT
0
Entering edit mode

What is the purpose of this exercise? As far as I understand, cBioPortal is not a data source, it's a place to look at data collected elsewhere. Strictly speaking, TCGA is not a raw data source either but it is the closest you can get to one. cBioPortal operates on various TCGA datasets, so this comparison is comparng among subsets of the same dataset.

ADD REPLY
0
Entering edit mode

I should perhaps edit the title to "Compare frequency of gene mutations in two different databases (internal db vs TCGA)", I simply took TCGA and cbioportal as as example. The idea is to see if there are any significant differences in the occurrence of gene mutations in the two different database. Thanks!

ADD REPLY
0
Entering edit mode

Ah, I see. That makes a lot more sense.

I don't see how that can be done in a statistically meaningful manner though - after all, per gene, you'd have just two numbers - freq_in_internal_db and freq_in_external_db. How would you do Fisher's test here? Are you testing if genes on average have a higher mutation rate in one cohort vs the other?

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6