Entering edit mode
8.1 years ago
faniastrokes
•
0
Dear All, I got RNAseq data already processed. I have txt file containing "Official gene symbol", "TSS id" and "locus coordinates" and relative "FPKM" value. I need to have the "gene id" (that's fine) and also the Refseq RNA id (NM ....), because I NEED TO KNOW WHICH ISOFORM has this FPKM value. PLEASE help me to understand how can I get this information from gene symbol, TSS id and locus!!
Always useful to post an example snippet and specify what genome this data is from.
I apologise! Human Genome, in particular MCF-7 cell lines.
You may want to speak with the people who processed the data. It is possible that they generated this at the gene level, rather than the transcript level.
I'm pretty sure not all isoforms can be uniquely identified by the information you describe you have.
I think one of these might have it?
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/kgXref.txt.gz
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
This one http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz has Chromosome, strand (+ or -), gene_id, gene_symbol and RNA accession number. But I couldn't understand the other columns .. I need the locus coordinates; to link them to the different RNA accession number.