Question

CNA calculation from Segmentation file

1

Entering edit mode

9.3 years ago

Meenakshi ▴ 10

Hi

I have a compiled segmentation file (nocnv) for almost 1000 samples. I want to calculate the copy number aberrations for all the samples.

I tried using GISTIC, but I believe it provides only the consensus copy number variations. Is there a better way to calculate copy number aberration from segmentation file.

Here are few lines from the segmentation file I have:

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
TCGA-E2-A1IH-01A-11D-A13J-01    1    3208470    42629316    20791    -0.2864
TCGA-E2-A1IH-01A-11D-A13J-01    1    42631290    42692913    60    0.0183
TCGA-E2-A1IH-01A-11D-A13J-01    1    42701180    42918495    113    0.3411
TCGA-E2-A1IH-01A-11D-A13J-01    1    42925163    43153569    131    -0.0548
TCGA-E2-A1IH-01A-11D-A13J-01    1    43153652    43599777    288    -0.3653
TCGA-E2-A1IH-01A-11D-A13J-01    1    43600151    44642193    575    -0.0401

Thanks

Meenakshi

Segmentation-File CNA copy-number • 6.7k views

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by Meenakshi ▴ 10

0

Entering edit mode

When you say segmentation file, that sounds like it already contains CNV segments. Can you show us some of the file so we know what you need to do?

ADD REPLY • link 9.3 years ago by karl.stamm 4.1k

0

Entering edit mode

I added some sample lines from the segmentation file for your review.

Thanks!

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by Meenakshi ▴ 10

0

Entering edit mode

Do you mean you want to know how much CNAs there are per sample?

ADD REPLY • link 9.3 years ago by Irsan ★ 7.8k

0

Entering edit mode

Yes, I am trying to calculate the Genome instability index per sample using the CNA from each sample. So, I want to quantify the CNAs using the segmentation file for each sample.

Thanks

ADD REPLY • link 9.3 years ago by Meenakshi ▴ 10

1

Entering edit mode

This is called data aggregation. Although you can do this in any programming language, I think for you it is easiest to do with Excel pivot tables. Follow some tutorials on how to use pivot tables and then apply to your segmentation results. As Charles warden suggested, you might want to add a column that describes if a segment meets several criteria (length, amplitude) and is therefore considered significant.

ADD REPLY • link 9.2 years ago by Irsan ★ 7.8k

0

Entering edit mode

Could you explain how you calculated the genome instability using CNA? Thanks

ADD REPLY • link 8.1 years ago by biostar • 0

0

Entering edit mode

9.3 years ago

Meenakshi ▴ 10

Thank you all for your suggestions and guidance.

Based on what I could gather, I am applying different cutoffs on #Probes and Segment length to filter out the noisy data. Using the chosen data , I am calculating Absolute CN using (2^Segment_mean)*2. if the value is >2 it can be considered as a copy number change, but to get more significant CNs, I am considering 2.7 as cutoff for absolute CNS. This is purely intuitive and I still need to work a way out to get optimal cutoffs.

Thanks for all the help.. Any comment/criticism is welcome.

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by Meenakshi ▴ 10

0

Entering edit mode

Actually I think you don't need the cutoff and it might even be better if you don't use them. You can just calculate the number of segments per sample and that will be proportional to the genomic instability. Try both methods and see what the correlation is between them.

ADD REPLY • link 9.2 years ago by Irsan ★ 7.8k

Ram · Accepted Answer · 2015-07-27

4

Entering edit mode

9.3 years ago

Charles Warden 8.3k

You'll want to define some threshold for |Segment_Mean| > X because I would probably not consider all segments to be true copy number alterations. Filtering based upon length and number of probes is probably also a good idea.

I don't have a precise recommendation of what threshold to use, but I would try plotting density / histogram values for the Segment_Mean to guide your decisions.

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by Charles Warden 8.3k

3

Entering edit mode

Here is a little R function to plot segments.

plot_segments = function(segments_file){
  seg = read.delim(segments_file, sep='\t)
  seg.spl = split(seg,as.factor(as.character(seg$Chromosome)))

  pdf(file=paste(segments_file,"pdf",sep="."),paper="special",width=12,onefile=T,pointsize=8)

  par(mfrow=c(4,1))

  for(i in 1:length(seg.spl)){
    x = seg.spl[[i]]
    plot(x$Start,x$Segment_Mean,xlim = c(x[1,3],x[nrow(x),4]),pch = "",ylim = c(-3,3),xlab = paste("chr",names(seg.spl[i]),sep='_'),ylab = 'log2 ratio')
    points(x$End,x$Segment_Mean,pch = '')
    segments(x0 = x$Start , y0 = x$Segment_Mean , x1 = x$End, y1 = x$Segment_Mean, lwd = 2, col = 'maroon')
    abline(h = 0, lty = 1,lwd = 0.5)    
  }
  dev.off()
}

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by poisonAlien ★ 3.2k

2

Entering edit mode

@poisonAlien, Thank you for the great script. There were some typos in it so I fixed it. Here is the functioning script.

plot_segments = function(segments_file){
    seg = read.delim(segments_file, sep="\t")
    seg.spl = split(seg,as.factor(as.character(seg$Chromosome)))
    pdf(file=paste(segments_file,"pdf",sep="."),paper="special",width=12,onefile=T,pointsize=8)
    par(mfrow=c(4,1))
    for(i in 1:length(seg.spl)){
        x = seg.spl[[i]] 
        plot(x$Start,x$Segment_Mean,xlim = c(x[1,3],x[nrow(x),4]),pch = "",ylim = c(-3,3),xlab = paste("chr",names(seg.spl[i]),sep="_"),ylab = "log2 ratio")
        points(x$End,x$Segment_Mean,pch = "")
        segments(x0 = x$Start , y0 = x$Segment_Mean , x1 = x$End, y1 = x$Segment_Mean, lwd = 2, col = "maroon")
        abline(h = 0, lty = 1,lwd = 0.5)  
    }
    dev.off()
}

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.2 years ago by Les Ander ▴ 110

1

Entering edit mode

That's right. It looks like Meenakshi already has segments and their log-ratio copynumber to some kind of control. Thus when the value says 0 at 42.6MB it's a copy-normal segment, negatives are losses and positives are gains. There's a lot of noise on a microarray and you have to choose a cutoff for a strong signal that you really believe. If the Segment Mean says 1, that's a log2 of 200%, or one gain, possibly 4 copies of a naturally diploid organism.

Anyway, histogram and filter the segment means as CharlesWarden suggested and take a look at the sex chromosomes for guidance (known ploidy).

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.3 years ago by karl.stamm 4.1k

0

Entering edit mode

Hi, a quick question. I am wondering to what the Num_Probes is referring to. Is it the number of hetrozygous SNPs within a segment or number of read depth counts within a segment?

ADD REPLY • link 8.6 years ago by Dataman ▴ 380