Question

Understanding gene level copy number data from TCGAbiolinks

0

Entering edit mode

2.1 years ago

billyK • 0

Hi all. Thanks in advance for helping me out.

I'm trying to analyze copy number data from TCGA (using TCGAbiolinks), and trying to define genes that are either amplified or deleted.

To download gene level copy number alteration, I used the code below:

query <- GDCquery(project = 'TCGA-BRCA', data.category = 'Copy Number Variation', data.type = 'Gene Level Copy Number', sample.type = 'Primary Tumor')

I have three questions related to the downloaded data.

First, I'm curious to know the pipeline used to calculate gene level copy numbers.

Seconly, I've noticed that some patients have gene level copy numbers that are unexpectedly huge. For example, 'TCGA-A8-A093-10A-01D-A012-01' had a copy number of 26 in a gene "ENSG00000085733.16". I'm curious to know if this is usual.

Finally, what would be a cutoff score for gene level copy number to define whether a gene is amplified or deleted?

Thank you so much for your help.

CNV TCGA • 1.0k views

ADD COMMENT • link updated 11 months ago by Zhenyu Zhang ★ 1.2k • written 2.1 years ago by billyK • 0

0

Entering edit mode

Were you able to get answer for your questions?

ADD REPLY • link 11 months ago by bioinfo355 • 0

0

Entering edit mode

This is a GDC question, not a TCGAbiolinks question. For TCGA, GDC uses ASCAT2 (SNP6) and ASCATNGS (WGS) for integer value copy number. And gene level copy number is just intersect gene region with segmentation file, with some handling of edge cases. I am pretty sure these have been clearly described in the GDC documentation.

ADD REPLY • link 11 months ago by Zhenyu Zhang ★ 1.2k