Entering edit mode
10.0 years ago
pwg46
▴
540
Hello, I am looking for a data file which I can parse locally to map between ENST and RefSeq transcript identifiers. Please don't link me to biomart or give me an sql query. Like I said, I want the actual raw data file, which I can have locally and parse on my own. For example, I found that human.protein.faa in refseq's databaase is good for Uniprot<-->Refseq protein conversions. If any of you know of a good data file for ENSG<--->Refseq gene conversions, that would be great as well.
Just save the biomart output for your organism of interest to a file. Then you can parse everything locally as many times as you like without the network delay.
Is this what you are asking, the transcript file from Ensemble?
No, that's not what pwg46 was asking for. The refseq to ensembl mappings are in one of the database tables on the FTP site, but I haven't a clue which one.
Is it readily available like that? I always thought i need to do it by my own using some ID converter.
Yes, one can just download the database table...if you can find out which table you need (this is the case for UCSC too). The Ensembl database is large enough that it's simpler to just use biomart and save the results to a file. You then have a flat file with all of the conversions in a species. Why OP doesn't just want to do that (it's the quick and easy route) is beyond me.