ps. sorry if my english is not too good, it is not my native language
XD
No te preocupes - tu inglés es excelente.
------------------
The main input that you require for GISTIC is the segmentation file, which should have:
- (1) Sample (sample name)
- (2) Chromosome (chromosome number)
- (3) Start Position (segment start position, in bases)
- (4) End Position (segment end position, in bases)
- (5) Num markers (number of markers in segment)
- (6) Seg.CN (log2() -1 of copy number)
To go direct from Control-FREEC to GISTIC 2.0, I actually believe the best output file to use is the '*_ratio.bed' file, which can be produced by the freec2bed.pl
script (see bottom of THIS page, under section entitled 'Translate Control-FREEC's output into Bed or Circos formats').
However, you will have to convert the copy number column in the BED output via:
log2(x) - 1
The problem will be to determine a value for the 'Num markers' column for the GISTIC input file. From Control FREEC, the reads per interval are stored in the *.cpn files, I believe, and these could be used as 'pseudo-markers'. To find a way to overlap these with the BED file will be extra work for you, thoug - it could be done via complex BEDTools commands, or within R using GenomicRanges.
<h6>#</h6>
Para concluir, it is not impossible to use Control FREEC with GISTIC; however, it may be easier to use DNAcopy with the aligned BAM file and just avoid the use of Control FREEC. It is your choice.
Hasta pronto
Kevin
NB - for GISTIC versions >2.0.23, no markers file is required.
Gracias Kevin!! I will try and generate the bed and do the overlap with GenomicRanges (and will definitely put the code here if I succeed). I didn't like the idea of using DNAcopy as for what I have read, specificity is very low for Exome data, unlike Control-FREEC which is more reliable. Anyways, in case I cannot do the first, I might use DNAcopy, as GISTIC is my priority in this analysis. Thank you!! Best, Daiana
Dear Daiana, did you manage to create the Gistic Segmentation File starting from the output Control-FREEC? if yes, would you be so kind to share the GenomicRanges/bedtools commands you used to combine the pseudo marker info contained in the *.cpn files with the bed files? Thank you a lot, A