Hi there,
I am using GISTIC2 to identify recurrent SCNAs (somatic copy number alterations) on a cohort of primary tumour samples matched with metastatic tumour samples.
I ran GISTIC2 separately on the primary tumours and then metastatic tumours. For each recurrent amplification and deletion identified, there is a G-score.The G-score is calculated based on the amplitude of the aberration as well as the frequency of its occurrence across samples.
Now for each amplification and deletion identified in the metastatic tumours, I would like to compare the G-score to the G-score in the primary tumours to identify regions distinct to mets.
Would anyone have any experience in doing this?
I was thinking of utilising ChIP-Seq differential binding methods or I have seen a paper using a gaussian process latent difference model. They provide the math but I am not sure in practise where to start with it.
My first approach was as follows:
The GISTIC2 G score data looks like this:
head(prim.gistic@gis.scores, n=3)
variant chr start end fdr G_Score Avg_amplitude frequency
Amp 1 1582202 1635250 0 0.183352 0.379261 0.230769
Amp 1 1636274 1638925 0 0.204790 0.374585 0.256410
Amp 1 1650969 1659000 0 0.212745 0.398385 0.256410
head(mets.gistic@gis.scores, n=3)
variant chr start end fdr G_Score Avg_amplitude frequency
Amp 1 1582202 1644747 0 0.075575 0.228630 0.205128
Amp 1 1647900 1650845 0 0.142388 0.358165 0.205128
Amp 1 1666300 1689800 0 0.183941 0.438726 0.205128
As you can see the start, end position for each line is not the same in the primary and mets. Therefore I tried using subsetOverlaps from GRanges to find intervals that overlap, with the intention of comparing mets gscore vs prim gscore. However, only one gscore is in the return GRanges object.
> prim.gr <- makeGRangesFromDataFrame( df = prim.gistic@gis.scores,
keep.extra.columns = TRUE,
seqnames.field = "chr",
start.field = "start",
end.field = "end" )
> mets.gr <- makeGRangesFromDataFrame( df = mets.gistic@gis.scores,
keep.extra.columns = TRUE,
seqnames.field = "chr",
start.field = "start",
end.field = "end" )
> subsetByOverlapsmets.gr, prim.gr)
GRanges object with 36805 ranges and 5 metadata columns:
seqnames ranges strand | variant fdr G_Score Avg_amplitude frequency
<Rle> <IRanges> <Rle> | <character> <numeric> <numeric> <numeric> <numeric>
[1] 1 1582202-1644747 * | Amp 0 0.075575 0.22863 0.205128
[2] 1 1647900-1650845 * | Amp 0 0.142388 0.358165 0.205128
[3] 1 1666300-1689800 * | Amp 0 0.183941 0.438726 0.205128
[4] 1 1696879-1718750 * | Amp 0 0.155353 0.478574 0.179487
[5] 1 1720600-1849550 * | Amp 0 0.185098 0.476421 0.205128
Is there a way using GRanges to retain both gscore columns in the subsetOverlaps output?
If there isn't, my next step was to take the chr, start, end and gscore columns to make a bed file for the primary tumours and mets tumours and then use HOMER to find differential peaks,
http://homer.ucsd.edu/homer/ngs/mergePeaks.html
Any thoughts would be much appreciated!
Hi, have you found a statisticak test to compare the g-score from two groups? I have seen also a paper using a gaussian process latent difference model (https://www.nature.com/articles/s41588-020-0592-7), but I'm not sure how to do that. Any thoughts would be much appreciated, thanks!