Question

TCGA CNV data reformatting

0

Entering edit mode

4.3 years ago

vctrm67 ▴ 80

I have TCGA data that needs to be reformatted according to the following:

# 'x' is a matrix of segmented output from ASCAT, with at least the
#   following columns (column names are not important):
# 1: sample id
# 2: chromosome (numeric)
# 3: segment start
# 4: segment end
# 5: number of probes
# 6: total copy number
# 7: nA
# 8: nB
# 9: ploidy
# 10: contamination, aberrant cell fraction

However, I'm not sure how to do this. For one, I'm assuming nA and nB refer to allele-specific copy numbers. But TCGA only has the following data:

A segmentation file:

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean
DRAMA_p_TCGA_276_278_N_GenomeWideSNP_6_A04_1322446      1       61735   98602   17      0.3913
DRAMA_p_TCGA_276_278_N_GenomeWideSNP_6_A04_1322446      1       228706  603590  16      -0.2696

A raw copynumber data file:

Composite Element REF   Signal
CN_473963       2.87
CN_473964       2.044

A allele-specific copynumber file:

Composite Element REF   Signal_A        Signal_B
SNP_A-8575125   1.865   0.026
SNP_A-8497791   1.843   -0.426

I'm not sure how to reformat these files into what is needed. Specifically, I don't see how I can get the allele-specific copy numbers (nA and nB) for each segment.

Does anyone have any suggestions?

CNV • 1.7k views

ADD COMMENT • link updated 4.3 years ago by markus.riester ▴ 550 • written 4.3 years ago by vctrm67 ▴ 80

0

Entering edit mode

Yes, you can safely add them. In short segments you sometimes have not enough heterozygous SNPs to resolve major and minor, but total should be still reliable. I guess that’s the reason there is a separate column for total in your template.

ADD REPLY • link 4.3 years ago by markus.riester ▴ 550

0

Entering edit mode

Do you know how I might get the number of probes in each segment? I do not see an ASCAT output for that.

ADD REPLY • link 4.3 years ago by vctrm67 ▴ 80

score 0 · Answer 1 · 2020-07-20

0

Entering edit mode

4.3 years ago

markus.riester ▴ 550

You will need the ABSOLUTE or ASCAT output. There are TCGA pan cancer papers providing the former for at least most of the datasets. ABSOLUTE will get you the missing columns 6 to 10.

ADD COMMENT • link 4.3 years ago by markus.riester ▴ 550

0

Entering edit mode

Thank you. I ran ASCAT and in the segmented output file, there are two columns named "nMajor nMinor". Could I just add these to get the total copy number? I am hesitant because perhaps I will get total CNV counts in segment regions that are not the same as the ones identified by ASCAT?

ADD REPLY • link 4.3 years ago by vctrm67 ▴ 80