Tutorial:Exploring cancer mutation data portals
2
72
Entering edit mode
10.5 years ago

This tutorial describes examples of data portals (visual interfaces, APIs, etc.) that allow a user to mine publicly available cancer sequence data for somatic and/or germline mutations. These resources allow the user to assess the recurrence of specific mutations within cancer subtypes, their sequence identity, predicted functional consequence, etc. Example questions one might ask of such resources:

  • What are the most significantly mutated genes in a particular cancer type?
  • What mutations tend to co-occur or are mutually exclusive with each other in a tumor?
  • What positions or domains within the amino acid sequence of a gene are most frequently mutated? i.e. where are the mutation 'hotspots'?

Some relevant posts:

Here are some resources that I already know about and have used:

The first several are fantastic resources along the lines I am looking for. Please comment below if I am missing others? For example, there may be others that are less well known or that are more focused on a specific question.

Relevant reviews, primary articles, open-source software projects, etc. would also be welcome. I'm most interested in resources that create a platform for performing complex queries of the raw data, provide summaries and visualizations, etc. I will try to update this tutorial with examples and feedback from the community.

Here are some related resources that we have created ourselves to complement some of those resources listed above:

A nice introductory tutorial (video) on Cancer Variant Knowledgebases: "Introduction to Publicly Available Knowledgebases to Aid Interpretations of Genomic Findings in Oncology".

https://www.youtube.com/watch?v=4dBh1Qkp8os

TCGA Cancer ICGC Data-Portal Mutation • 16k views
ADD COMMENT
0
Entering edit mode

thanks very very very much!!

ADD REPLY
0
Entering edit mode

You are most welcome. I just updated this post to include the Genomic Data Commons. This is a great resource for accessing the raw data, variant call files, etc.

ADD REPLY
0
Entering edit mode

Thank you very much!

ADD REPLY
0
Entering edit mode

Are you solely interested in tools that just visualize/download publicly available data sets? What about resources where you can actually submit your own mutations and annotate, analyze, and visualize?

ADD REPLY
0
Entering edit mode

I find Firebrowse to be a very useful place to access TCGA data.

ADD REPLY
1
Entering edit mode
8.8 years ago

If you have specific genes you are interested in, I wrote a tool to explore expression between tissues.

ADD COMMENT
0
Entering edit mode

Nice! This looks awesome and performs very well. The above list is very DNA focused. Maybe we should create a separate post on Exploring cancer expression data portals...

ADD REPLY
0
Entering edit mode

Good idea, I'll delete this post.

ADD REPLY
1
Entering edit mode
6.9 years ago
rafi.zon ▴ 10

That's an excellent list of the different available databases. I'm doing a research about driver mutations vs. passenger mutations and I'm not sure which database to use to get a list of driver mutations that are known to cause cancer and not just appear in cancer samples.
Which databases or available datasets would you recommend the most for this purpose?

ADD COMMENT
1
Entering edit mode

The reason these portals exist is that in part it is still an open research question which mutations are definitively drivers versus passengers. That being said, for your needs you might want to approach this problem from the perspective of more established tumor suppressors and oncogenes.

Here is a companion post to this one that covers those: Database Of Tumor Suppressors And/Or Oncogenes

Probably the most popular answer is to use the Cancer Gene Census

ADD REPLY
0
Entering edit mode

Malachi, Thanks again!

I already searched many of the databases for cancer driving mutations for doing supervised learning. The Cancer Gene Census is something in the line of what I'm looking for. However, what it provides is a list of cancer genes and it doesn't differentiate the passenger and driver mutations that could be present within the same cancer gene. Isn't there a list of well-known driver mutations that are known to directly have a carcinogenic effect? I believe the following portals are most closely related to my search:

Would these be reliable resources altogether for my 'list' of driver mutations in cancer or are there limitations for using them?

ADD REPLY
2
Entering edit mode

Other options that are in the vein of DoCM: Cancer Hotspots

Other options that are in the vein of CIViC: Jackson Lab's JAX CKB MSKCC's ONCOKB Cornell's PMKB IRB's CGI

If you would like learn more about efforts to harmonize the efforts of the cancer variant interpretation resources you can check out cancervariants.org I wish I could say that there are not significant limitations to these resources, but there are. With all the tumor genome/exome sequence data that is now out there it is starting to be possible to come up with lists of specific mutation sites that are significant hotspots. These are highly suggestive of activating mutations that are key drivers in cancer. We are still discovering new hotspots in rarer cancer types though. But the bigger problem is that of tumor suppressors (TS). It is relatively easy to define a TS that is significantly mutated at the gene level. But there are so many ways to break the function of a gene, we don't see the hotspot pattern we see with oncogenes. Yet these mutations can be just as critical to carcinogenesis. When we see a new mutation in BRCA1 or TP53 it can be hard to know for sure if that mutation is pathogenic/functional. It might not have been seen previously. Until we functionalize it or see it enough times in cancers of that type we are unsure if it is just a random passenger that happens to be in a cancer gene or a true driver. Thus the cataloging of these mutations is highly incomplete. Resources like the BRCA exchange have expended enormous effort to try and do a decent job of tracking down the functional pathogenic vs. benign variants for just two cancer genes. Other databases focus on other genes. Coming up with one grand list that is comprehensive and high quality remains a major research challenge. Efforts such at the GA4GH and VICC driver project cancervariants.org) are trying to harmonize the various efforts underway and at least make it easier to combine knowledge from across the many, many relevant resources and data sets out there. Still very much a problem to be solved though.

ADD REPLY
0
Entering edit mode

Thanks a lot for your elaborate answer, Malachi. Your answer gave me a better overview of the problem at hand. Let's hope with time and more research efforts we will understand the pathogenic pathways (as for the TS) much better, leading to more harmonization.

ADD REPLY

Login before adding your answer.

Traffic: 1400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6