I am analyzing CNV data downloaded from TCGA database (level 3) and aim to convert it to a gene-level matrix.
The files are like the below:
Sample Chromosome Start End Num_Probes Segment_Mean
BAIZE_p_TCGA_b138_SNP_N_GenomeWideSNP_6_A02_808774 1 3218610 16796721 7253 -0.0198
BAIZE_p_TCGA_b138_SNP_N_GenomeWideSNP_6_A02_808774 1 16796742 17763566 312 -0.3615
BAIZE_p_TCGA_b138_SNP_N_GenomeWideSNP_6_A02_808774 1 17764034 221905958 105172 -0.0073
To convert CNV data to gene-level data, I map genome regions to genes. In some cases, two different regions with different 'Segment_Mean' values are mapped to one gene. In this case, is it correct if I use the average of 'Segment_Mean' values for that gene?
Any thoughts?
It should be mentioned that the data has been obtained using SNP Array 6.0.
Thanks
@Chris Miller, Can you introduce some references about interpreting CNV data? I am new in this field.