Hi all. Thanks in advance for helping me out.
I'm trying to analyze copy number data from TCGA (using TCGAbiolinks), and trying to define genes that are either amplified or deleted.
To download gene level copy number alteration, I used the code below:
query <- GDCquery(project = 'TCGA-BRCA', data.category = 'Copy Number Variation', data.type = 'Gene Level Copy Number', sample.type = 'Primary Tumor')
I have three questions related to the downloaded data.
First, I'm curious to know the pipeline used to calculate gene level copy numbers.
Seconly, I've noticed that some patients have gene level copy numbers that are unexpectedly huge. For example, 'TCGA-A8-A093-10A-01D-A012-01' had a copy number of 26 in a gene "ENSG00000085733.16". I'm curious to know if this is usual.
Finally, what would be a cutoff score for gene level copy number to define whether a gene is amplified or deleted?
Thank you so much for your help.
Were you able to get answer for your questions?
This is a GDC question, not a TCGAbiolinks question. For TCGA, GDC uses ASCAT2 (SNP6) and ASCATNGS (WGS) for integer value copy number. And gene level copy number is just intersect gene region with segmentation file, with some handling of edge cases. I am pretty sure these have been clearly described in the GDC documentation.