I have TCGA data that needs to be reformatted according to the following:
# 'x' is a matrix of segmented output from ASCAT, with at least the
# following columns (column names are not important):
# 1: sample id
# 2: chromosome (numeric)
# 3: segment start
# 4: segment end
# 5: number of probes
# 6: total copy number
# 7: nA
# 8: nB
# 9: ploidy
# 10: contamination, aberrant cell fraction
However, I'm not sure how to do this. For one, I'm assuming nA
and nB
refer to allele-specific copy numbers. But TCGA only has the following data:
A segmentation file:
Sample Chromosome Start End Num_Probes Segment_Mean
DRAMA_p_TCGA_276_278_N_GenomeWideSNP_6_A04_1322446 1 61735 98602 17 0.3913
DRAMA_p_TCGA_276_278_N_GenomeWideSNP_6_A04_1322446 1 228706 603590 16 -0.2696
A raw copynumber data file:
Composite Element REF Signal
CN_473963 2.87
CN_473964 2.044
A allele-specific copynumber file:
Composite Element REF Signal_A Signal_B
SNP_A-8575125 1.865 0.026
SNP_A-8497791 1.843 -0.426
I'm not sure how to reformat these files into what is needed. Specifically, I don't see how I can get the allele-specific copy numbers (nA
and nB
) for each segment.
Does anyone have any suggestions?
Yes, you can safely add them. In short segments you sometimes have not enough heterozygous SNPs to resolve major and minor, but total should be still reliable. I guess that’s the reason there is a separate column for total in your template.
Do you know how I might get the number of probes in each segment? I do not see an ASCAT output for that.