Entering edit mode
4.4 years ago
prithvi.mastermind
▴
50
I have retrieved TCGA raw read counts from UCSC Xena for oral cancer. List of Ensembl IDs is present in the counts file. The UCSC Xena browser also provides a file at the same link consisting the Ensembl IDs from expression file and their corresponding Gene Symbols. There are many instances where a single gene symbol corresponds to different Ensembl IDs. What should I do in this case to remove duplicates? Is averaging across the sample best option here? Or something else.
I’ve found that in many cases where a single ENSG ID corresponds to multiple HGNC symbols, only one of the ENSG IDs is located in the primary assembly (chromosome 1-22,X,Y). You may want to narrow the entries to the primary assembly and then map ENSG IDs to HGNC symbols.