Hello, I am looking for a good data file to convert between refseq (specifically, accession identifiers such as NM_...) and ensembl transcript (ENST...). I am looking for a file which can be downloaded easily via a simple script usng ftp. Also, I would prefer it to be some form of text file with a decent format, as I would essentially be parsing the enst and NM_ parts and insertng them into a mysql table.
Are you looking to store ENS and NM ids for the same sequence? Or do you wanna store GenBank and ENSEMBL entries?
Hmm, I guess the latter. I've been looking through the GenBank data files, and they have large data files for each chromosome. These files do have ENST -> NM_ mappings for every trancsript on each chromosome, however I feel like using these data files would not be efficient. Not only are they large and take a fairly long time to download, but also parser scripts would take quite a while even though I simply want to create a tab-delimited txt file where the ENST id would be the first tab, and its corresponding NM_ id would be in the second tab.
I'd suggest tinkering with UCSC Genome Browser's mysql database. You should be able to write a query/script that, given ID1, does a bunch of SELECTs for ID2.