Entering edit mode
5.1 years ago
berry
▴
40
Hi,
I have TCGA segment files and I want to analyse the number of breakpoints. Does anyone know how to calculate it?
Thanks!
Hi Kevin,
You are right. I downloaded the segment files using TCGAbiolinks. And the files look like this :
Sample Chromosome Start End Num_Probes Segment_Mean GLAZY_p_TCGA_B20_SNP_N_GenomeWideSNP_6_H03_517846 1 3218610 8355467 3166 -0.0355 GLAZY_p_TCGA_B20_SNP_N_GenomeWideSNP_6_H03_517846 1 8372558 9407175 484 -0.9326 GLAZY_p_TCGA_B20_SNP_N_GenomeWideSNP_6_H03_517846 1 9408959 21324564 6343 -0.0398
I see, so, this seems to be the circular binary segmented copy number data. You now have to define what you mean by 'breakpoint' and then search for these in the data. A 'breakpoint', I find, means something different to different people' however, generally it can be regarded as a point of deletion or some other structural 'anomaly', such as inversion or translocation.
Hi Kevin, what I want to do is to count the number of genome breaks occurring across the genome, in each sample. The breaks where the copy number changes (amplified/deleted). I want to see if there is a difference or a trend in breakpoint counts within the samples I grouped together (ie. which samples have more copy number load).
Okay, it should be relatively easy to do, in that case. You should review how these files were produced by taking a look here: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/
Keep in mind that they may have undergone extra data processing steps by TCGAbiolinks - check this with the TCGAbiolinks manuscript and / or online documentation.
Then, you may understand the threshold that you need to use to define a breakpoint deletion. Detecting translocations and inversions from this data will be next to impossible - you would require the original BAM files at least.