How to perform GISTIC analysis on GenePattern?
0
0
Entering edit mode
5.2 years ago

Hi guys, I have got a CNV data from TCGA and the marker file as shown below,

  • Seg file:
  • Sample Chromosome Start End Num_Probes Segment_Mean

  • TCGA.05.4249.01A 1 3218610 120527361 67456 -0.1725

  • TCGA.05.4249.01A 1 149881398 167526508 10663 0.5859
  • TCGA.05.4249.01A 1 167526675 167526823 2 -1.1518
  • TCGA.05.4249.01A 1 167526972 247813706 50571 0.5816
  • TCGA.05.4249.01A 2 484222 242476062 130861 0.0585

  • Markers file:

  • Probe.Name Chromosome Start

  • CN_473963 1 61735
  • CN_473964 1 61808
  • CN_473965 1 61823
  • CN_477984 1 62152
  • CN_473981 1 62920
  • CN_473982 1 62937

And my goal is to perform GISTIC analysis with the GISTIC 2.0 module in GenePattern, but the result is always like this: "GISTIC version 2.0.23 GISTIC 2.0 input error detected: 76606 segment start or end positions in '/opt/gpcloud/gp_home/users/genye/uploads/tmp/run8835511072592266907.tmp/seg.file/1/biguoshu1.txt' do not match any markers in '/opt/gpcloud/gp_home/users/genye/uploads/tmp/run6032895478607754030.tmp/markers.file/2/markersMatrix.txt'. First bad position is 10:24732567 at line 33."

I have uploaded my files in .txt format and choose the GISTIC version 6.15.28 and Human_hg19.mat as the refgene file. All other parameters were by default. Could anyone please tell me what the problem is and how to solve? Thank you !

snp RNA-Seq CNV TCGA GISTIC • 4.7k views
ADD COMMENT
1
Entering edit mode

Hi,did you fix the problem? I had the same problem

ADD REPLY
0
Entering edit mode

Please show the exact error message, a sample of your input data, and all commands that you have tried. Thank you.

ADD REPLY
0
Entering edit mode

thank you for your reply!

error: GISTIC 2.0 input error detected:
198278 segment start or end positions in '/opt/gpcloud/gp_home/users/yuduoduo/uploads/tmp/run1102255665308516451.tmp/seg.file/1/MaskedCopyNumberSegment.txt' do not match any markers in '/opt/gpcloud/gp_home/users/yuduoduo/uploads/tmp/run5183191344025581000.tmp/markers.file/2/genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1(1).txt'.
First bad position is 1:2116145 at line 1.

input file:

TCGA-MQ-A4LJ-01A    1   62920   2116145 358 0.0051
TCGA-MQ-A4LJ-01A    1   2125269 3259074 359 -0.0884
TCGA-MQ-A4LJ-01A    1   3259896 12779433    5732    0.0459
TCGA-MQ-A4LJ-01A    1   12792599    12922922    33  -0.7534

marker file:

genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt
 CN_473963  1   61735
CN_473964   1   61808
CN_473965   1   61823
CN_477984   1   62152
CN_473981   1   62920
CN_473982   1   62937
CN_497980   1   72704

All other parameters were by default thank you !

ADD REPLY
0
Entering edit mode

You could try without the markers file, which is now possible with later versions of GISTIC. Also, just double-check that the formatting of your files is correct.

ADD REPLY
0
Entering edit mode

Thank you for you help Kevin! but another problem arised:I got many regions amplified/deleted. The plots are very noisy, amplification/deletion occurred in almost every gene.Could you please tell me what the problem is and how to solve? Thank you !

ADD REPLY
0
Entering edit mode

Hi, you are not giving me much information with which I could use to begin to help. Please share, in detail, the data that you obtained, and the code that you used to process it.

ADD REPLY
0
Entering edit mode

sorry! The plot is always like this: https://ibb.co/swrq1qD

my maskedCNVsegment data was from TCGA and I perform GISTIC analysis with the GISTIC 2.0 module in GenePattern. All other parameters were by default .The plots are very noisy, amplification/deletion occurred in almost every gene

ADD REPLY
0
Entering edit mode

You took data from the GDC? That data is segmented copy number data produced by DNAcopy, I believe. You then used that as input to GISTIC?

Could you take a look here to see how this matches up to what you have done? - A: How to extract the list of genes from TCGA CNV data

ADD REPLY
0
Entering edit mode

Follow up on this part of the error:

First bad position is 10:24732567 at line 33.

As it alludes to chromosome 10, perhaps one of your files is not sorted numerically, and is instead sorted lexicographically

ADD REPLY
0
Entering edit mode

Thank you for you helping Kevin! But after I sorted my files numerically, it still showed similar result (... do not match any markers...) . Is it possible that the Marker File I submitted doesn't fit TCGA data, or that the online version of GISTIC2 doesn't work at all?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I think that I read somewhere, by the way, that the IDs have to be like this:

TCGA-MQ-A4LJ

So, less the final part. Can you try?

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6