CBS segmentation with DNAcopy taking ages/hanging
1
3
Entering edit mode
10.5 years ago
Irsan ★ 7.8k

I want to get genome wide copy number calls from a cohort (N=45) of human tumor-normal samples genotyped by Affymetrix GenomeWide SNP6.0 arrays. I have normalized the raw data and calculated Log2-R-Ratios for all probes (N=933,122) for all tumor-normal pairs, so far so good. However, when I perform CBS-segmentation with pruning implemented by DNAcopy it hangs (after 48h I killed it) on the 12the sample. However, when I randomly dilute the dataset to 100,000 probes, it takes about 10 seconds per sample to do the job. Is it something intrinsic from DNAcopy that it exponentially gets slower when the numbers of probes increase? What can I do to get good (i.e. not too much segments due to local trends) CBS-segmented copy number calls?

As a bonus question, in DNAcopy the min.width parameter only allows values between 2 and 5. What can I do when I want the segmentation algorithm to use at least 20 probes for segments?

library(DNAcopy)
samples <- colnames(lrr)[grep("MySampleIdentifier",colnames(lrr))]
number.probes <- 100e3
lrr <- lrr[sample(1:nrow(lrr),number.probes,replace=F),]
cna <- CNA(
  genomdat=lrr[,samples],
  chrom=lrr$chromosome,
  maploc=lrr$physicalPosition,
  data.type="logratio",
  sampleid=samples
)
segments <- segment(
  cna,
  min.width=5,
  undo.splits="prune",
  undo.prune=0.05
)
software-error snp • 4.9k views
ADD COMMENT
1
Entering edit mode
10.5 years ago
arno.guille ▴ 420
segments <- segment(cna)

With the "prune" option, the CBS is very slow for high resolution chips. I suggest you to keep the default parameters.

Then the min.width parameter seems to have no effect. Try this script in addition to the CBS

http://www.cbs.dtu.dk/~hanni/aCGH/MergeLevels.html

ADD COMMENT
0
Entering edit mode

Hi Arno, thanks for your suggestions, I'll try the MergeLevels script after default segmentation.

ADD REPLY
0
Entering edit mode

Hmm, the MergeLevels script seems very inconvenient as you have to transform the segmented data back to a segmented value for each probe. You can transform the CBS output this way:

predicted.lrr <-
  as.matrix(
    sapply(unique(segments$output$ID),
    function(x){
      index <- which(segments$output$ID==x)
      as.numeric(unlist(apply(segments$output[index,],1,function(t){rep(t[6],t[5])})))
    })
  )

But still, when MergeLevels is done you end up with a vector of corrected segment levels but not in the DNAcopy output format with 1 row per segment which for me is very convenient for downstream analyses. I will use CGHcall/CGHregions in stead.

ADD REPLY

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6