Entering edit mode
8.2 years ago
ddzhangzz
▴
90
I downloaded TCGA RNASeq Data of Isoforms and wanted to know how to map the UCSC isoform IDs to a gene's isoform symbols. Here is an example. Suppose I have the UCSC ids of gene SET:
id cds db geneName raw_count scaled_estimate
1 uc004bvt.3 0:04:02 hg19 SET 1152.45 3.14566448560844E-05
2 uc004bvu.3 0:05:54 hg19 SET 3334.8 8.86030965793767E-05
3 uc010myg.2 hg19 SET 78.63 2.12248781527862E-06
4 uc011mbj.1 0:00:13 hg19 SET 2121.49 6.52613657861146E-05
The gene SET has at least two named isoforms SET-alpha (or TAF-I alpha) and SET-beta (TAF-I beta). I would like to know how to map the id
(e.g. uc004bvt.3) to these two named isoform symbols.
idk what do you mean by mapping but if you meant just to have two columns with id and geneName then you can use this:
awk '{print $1,$4}' FILE_NAME