GISTIC segment overlap error
1
0
Entering edit mode
6.0 years ago
Wenhu_Cao ▴ 100

Hi everyone,

I am a little confused about the error complained by GISTIC:

GISTIC 2.0 input error detected:
169 segment overlaps detected in file '/path/***.seg.txt'.
First overlap detected between segments at lines 26227 and 26377.

I checked the first overlap lines in R, they are below:

line 26227:

Sample       Chromosome   Start      End Num_Probes Segment_Mean
  <chr>             <int>   <int>    <int>      <int>        <dbl>
1 TCGA-G4-6317          1 3218610 14832648       6157       -0.160

line 26377:

Sample       Chromosome   Start      End Num_Probes Segment_Mean
  <chr>             <int>   <int>    <int>      <int>        <dbl>
1 TCGA-G4-6317          1 3218610 49468025      24550       -0.099

I got the TCGA segmentation file from firehose, and only eliminated NAs and changed the names of Sample (save only first 12 digits). Does anyone know how to deal with this?

Thanks very much!

GISTIC SCNA CNV SNP • 5.0k views
ADD COMMENT
0
Entering edit mode

I think I may find a possibl cause, which is collapsing barcode to patients (first 12 chars) would cause different sample types of the same patient to have only one name. I will try this idea and let you know the results.

ADD REPLY
0
Entering edit mode

I think the reason is the cnv segment in line 26227 is included in segment in line 26377, they came from same sample , same chromosome, but the chromosome start and end is overlaped. you should consider to resolve this overlap. and it seems like you have 169 overlaps detected in your segmentation file. good luck!

ADD REPLY
2
Entering edit mode
6.0 years ago
Wenhu_Cao ▴ 100

Thanks for the jet lag between me and most biostars, it gives me time to solve this problem by my own.

The reason now seems trivial, however, I will note it for completeness. I substringed the first 12 chars of Tumor-Sample-Barcode to get patient only barcode for further analysis. The problem is that, TCGA has harvested not exactly only one tumor sample from one patient, like here, in my dataset, I found there are '01' - Primary Solid Tumor, '02' - Recurrent Solid Tumor and '06' - Metastatic for a single patient (the numbers are the 14th-15th chars of Tumor-Sample-Barcode, details here: TCGA barcode and TCGA code tables).

Then, substring will cause different sample results from a single patient use the same patient barcode as ID, that would cause GISTIC complain, and would also import bias into the following analysis.

OK, that's my experience, put it here as a reminder for myself!

ADD COMMENT
0
Entering edit mode

Thanks, but I filtered Primary Solid Tumor samples and it still shows error after GISTIC. I find out 22-25 chars of Tumor-Sample-Barcode can also differ in the same patient. It will result in error. Can you help me?

ADD REPLY
0
Entering edit mode

zhaoliang0302, you should consider contacting the GISTIC team at Broad Institute directly:

ADD REPLY

Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6