Comparing GISTIC2 G-Scores
2
3
Entering edit mode
4.4 years ago
noodle ▴ 70

Hi there,

I am using GISTIC2 to identify recurrent SCNAs (somatic copy number alterations) on a cohort of primary tumour samples matched with metastatic tumour samples.

I ran GISTIC2 separately on the primary tumours and then metastatic tumours. For each recurrent amplification and deletion identified, there is a G-score.The G-score is calculated based on the amplitude of the aberration as well as the frequency of its occurrence across samples.

Now for each amplification and deletion identified in the metastatic tumours, I would like to compare the G-score to the G-score in the primary tumours to identify regions distinct to mets.

Would anyone have any experience in doing this?

I was thinking of utilising ChIP-Seq differential binding methods or I have seen a paper using a gaussian process latent difference model. They provide the math but I am not sure in practise where to start with it.

My first approach was as follows:

The GISTIC2 G score data looks like this:

head(prim.gistic@gis.scores, n=3)

variant  chr  start    end     fdr   G_Score   Avg_amplitude   frequency
Amp      1    1582202  1635250  0    0.183352   0.379261       0.230769
Amp      1    1636274  1638925  0    0.204790   0.374585       0.256410
Amp      1    1650969  1659000  0    0.212745   0.398385       0.256410



head(mets.gistic@gis.scores, n=3)
variant  chr  start     end       fdr   G_Score  Avg_amplitude frequency
Amp       1    1582202  1644747    0     0.075575    0.228630  0.205128
Amp       1    1647900  1650845    0     0.142388    0.358165  0.205128
Amp       1    1666300  1689800    0     0.183941    0.438726  0.205128

As you can see the start, end position for each line is not the same in the primary and mets. Therefore I tried using subsetOverlaps from GRanges to find intervals that overlap, with the intention of comparing mets gscore vs prim gscore. However, only one gscore is in the return GRanges object.

> prim.gr <- makeGRangesFromDataFrame( df = prim.gistic@gis.scores, 
keep.extra.columns = TRUE, 
seqnames.field = "chr",
start.field = "start",
end.field = "end" )

> mets.gr <- makeGRangesFromDataFrame( df = mets.gistic@gis.scores, 
keep.extra.columns = TRUE, 
seqnames.field = "chr",
start.field = "start",
end.field = "end" )

> subsetByOverlapsmets.gr, prim.gr)
GRanges object with 36805 ranges and 5 metadata columns:

seqnames      ranges        strand | variant  fdr   G_Score  Avg_amplitude  frequency
<Rle>  <IRanges>  <Rle> |  <character> <numeric>  <numeric> <numeric> <numeric>
 [1]        1   1582202-1644747  * |   Amp    0     0.075575     0.22863      0.205128
 [2]        1   1647900-1650845  * |   Amp    0     0.142388     0.358165     0.205128
 [3]        1   1666300-1689800  * |   Amp    0     0.183941     0.438726     0.205128
 [4]        1   1696879-1718750  * |   Amp    0     0.155353     0.478574     0.179487
 [5]        1   1720600-1849550  * |   Amp    0     0.185098     0.476421     0.205128

Is there a way using GRanges to retain both gscore columns in the subsetOverlaps output?

If there isn't, my next step was to take the chr, start, end and gscore columns to make a bed file for the primary tumours and mets tumours and then use HOMER to find differential peaks,

http://homer.ucsd.edu/homer/ngs/mergePeaks.html

Any thoughts would be much appreciated!

GISTIC2 copy number paired samples wxs whole exome • 3.3k views
ADD COMMENT
0
Entering edit mode

Hi, have you found a statisticak test to compare the g-score from two groups? I have seen also a paper using a gaussian process latent difference model (https://www.nature.com/articles/s41588-020-0592-7), but I'm not sure how to do that. Any thoughts would be much appreciated, thanks!

ADD REPLY
0
Entering edit mode
4.4 years ago

However, only one gscore is in the return GRanges object.

Have you simply tried to change the column names in both input files? - that should work.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks Kevin,

I did try that and it just returns the meta columns for the first granges object

library(GenomicRanges)

colnames(prim.gistic@gis.scores ) <- paste0("prim_", colnames(prim.gistic@gis.scores ))

prim.gr <- makeGRangesFromDataFrame(
  df = prim.gistic@gis.scores,
  keep.extra.columns = TRUE,
  seqnames.field = "prim_Chromosome",
  start.field = "prim_Start_Position",
  end.field = "prim_End_Position"
)

colnames(mets.gistic@gis.scores ) <- paste0("mets_", colnames(mets.gistic@gis.scores ))

mets.gr <- makeGRangesFromDataFrame(
  df = mets.gistic@gis.scores,
  keep.extra.columns = TRUE,
  seqnames.field = "mets_Chromosome",
  start.field = "mets_Start_Position",
  end.field = "mets_End_Position"
)

# find mets regions also in prim that overlap
subsetByOverlaps ( mets.gr, prim.gr,type = "equal")
GRanges object with 76 ranges and 5 metadata columns:

seqnames ranges strand | mets_Variant_Classification mets_fdr mets_G_Score mets_Avg_amplitude mets_frequency
  <Rle>     <IRanges>    <Rle> |  <character> <numeric> <numeric>  <numeric>  <numeric>
[1] 1     8022900-8075600     * |   Amp         0     0.266339     0.962888    0.128205
ADD REPLY
0
Entering edit mode

Oh, I think that I faced this before a few times... You have to produce the overlaps object, and then 're-overlap' this with the original data-frames in two separate passes. Can you try that?

ADD REPLY
0
Entering edit mode

Okies, i think I have it figured out- thanks Kevin! 2 heads are definitely better than 1

# find mets regions also in prim that overlap
ol.gr  <- subsetByOverlaps( mets.gr, prim.gr, type = "equal")

# index for prim hits in mets
both.ind <- findOverlaps(query = mets.gr, subject = prim.gr, type = "equal")

# grab prim meta cols by index 
tmp.gr  <- prim.gr[subjectHits(both.ind),]

# add meta cols to ol.gr
mcols( ol.gr)[c(6:10)] <- mcols( tmp.gr)

head( ol.gr,n=2)
GRanges object with 2 ranges and 11 metadata columns:
seqnames  ranges strand | mets_Variant_Classification  mets_fdr mets_G_Score mets_Avg_amplitude mets_frequency
     <Rle>           <IRanges>  <Rle> | <character> <numeric>    <numeric>   <numeric>  <numeric>
[1]  1     8022900-8075600      * |  Amp  0     0.266339           0.962888       0.128205
[2]  1 203098275-203134950  * |  Amp  0.738406     0.923121           1.066993       0.871795

  prim_Variant_Classification  prim_fdr prim_G_Score prim_Avg_amplitude prim_frequency    
                  <character> <numeric>    <numeric>          <numeric>      <numeric>         
[1]   Amp         0     0.107352            0.31092       0.205128 
[2]  Amp  0.073643      0.46918           0.695091       0.923077

ADD REPLY
0
Entering edit mode

In terms of calculating the difference in G scores at overlapping regions, does taking the log2 fold change mets over prim seem like a good strategy?

# calc log2 FC
ol.gr$diff_gscore <- log2( ol.gr$mets_G_Score+1) - log2( ol.gr$prim_G_Score+1)
ADD REPLY
1
Entering edit mode

Great that you got it working, but I would definitely 'spot check' the end result just to make sure that it is as you expect it to be.

For G score differences, I am not sure. Are these scores not already log2 ratios? Would a subtraction not then be better, in that case?

For recurrent aberrations, I use a different program called GAIA (in R).

ADD REPLY
1
Entering edit mode

Yep, apologies, these values are already logged.

   ol.gr$diff_gscore <- ol.gr$mets_G_Score - ol.gr$prim_G_Score

Thanks Kevin!

ADD REPLY

Login before adding your answer.

Traffic: 1366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6