Hello Biostars - Thank you for all the help lately - just one more question.
If I navigate to the COSMIC page, there are files containing the ~60 SBS loadings for individual cancer patients by cancer type.These can be obtained by downloading, for instance, "SigProfilier_PCAWG_WGS_probabilities_SBS.csv", which is a flatfile.
As can be seen below, each row is one patient's mutation rate for a given trinucleotide sequence while columns are the SBS signature types, like this:
Sample Cancer Type Mutation Type Mutation Subtype SBS1 SBS2 SBS3 SBS4
SP117655 Biliary-AdenoCA C>A ACA 0.0045447 2.58E-06 0 0
SP117655 Biliary-AdenoCA C>A ACC 0.022974 0.0012906 0 0
SP117655 Biliary-AdenoCA C>A ACG 0.0083704 0.002148 0 0
SP117655 Biliary-AdenoCA C>A ACT 0.012359 0.00081708 0 0
SP117655 Biliary-AdenoCA C>G ACA 0.019838 4.12E-15 0 0
SP117655 Biliary-AdenoCA C>G ACC 0.019084 0.0018116 0 0
SP117655 Biliary-AdenoCA C>G ACG 0.0069102 0.00079127 0 0
SP117655 Biliary-AdenoCA C>G ACT 0.010542 0.00072964 0 0
SP117655 Biliary-AdenoCA C>T ACA 0.12931 0.00027331 0 0
SP117655 Biliary-AdenoCA C>T ACC 0.059811 0.011297 0 0
SP117655 Biliary-AdenoCA C>T ACG 0.97484 7.57E-05 0 0
SP117655 Biliary-AdenoCA C>T ACT 0.065793 0.011098 0 0
SP117655 Biliary-AdenoCA T>A ATA 0.016473 0.0017114 0 0
SP117655 Biliary-AdenoCA T>A ATC 0.056402 0.0192 0 0
SP117655 Biliary-AdenoCA T>A ATG 0.020153 0.00039076 0 0
SP117655 Biliary-AdenoCA T>A ATT 0.0024127 5.07E-15 0 0
The Sample column corresponds to the individual patients; can readily see this patient has biliary adenocarcinoma. OK, finally, here are the questions:
1) Biliary Adenocarcinoma is a good start. But, is there any way to drill down into these samples more? For instance, what would be the quickest way to separate the ~35 biliary adenocarc patients in this file into subcategories, for instance, IDH1+, IDH2+, FGFR2-fusion+, etc. ? I feel sure this must be possible. I'd prefer an annotated metadata like file, but if need be, I could probably download the raw data itself and figure out the drivers from that.
Is anyone familiar enough with this site to know a quick way to do it?
2) I imagine this is just like adjusting for loadings of other kinds, e.g. principal components. But, I wanted to ask, are there any pitfalls or idiosyncratic differences to be aware of? Example, do I need to match for gender? Alt splicing differs between sexes in drosophila, some of these cancers have dysregulated splicing, etc., etc. Just want to not make any mistakes.
Thank you for your help and advice.