Looks like NCBI eutils is able to find/map all of them. I put your identifiers in a file called new
(one ID per line).
for i in `cat ./new`; do efetch -db nuccore -id $i -format fasta | grep ">"; done
>XM_017006279.1 PREDICTED: Homo sapiens eukaryotic translation initiation factor 4E family member 3 (EIF4E3), transcript variant X3, mRNA
>XM_017008472.1 PREDICTED: Homo sapiens ATPase phospholipid transporting 10D (putative) (ATP10D), transcript variant X6, mRNA
>NM_021088.3 Homo sapiens zinc finger protein 2 (ZNF2), transcript variant 1, mRNA
>XM_005268372.4 PREDICTED: Homo sapiens PPARG coactivator 1 beta (PPARGC1B), transcript variant X4, mRNA
>XM_006716584.1 PREDICTED: Homo sapiens ectonucleotide pyrophosphatase/phosphodiesterase 2 (ENPP2), transcript variant X1, mRNA
>XM_011517885.2 PREDICTED: Homo sapiens ankyrin repeat domain 18B (ANKRD18B), transcript variant X3, mRNA
>XM_017019593.1 PREDICTED: Homo sapiens vezatin, adherens junctions transmembrane protein (VEZT), transcript variant X23, mRNA
>XM_017004256.1 PREDICTED: Homo sapiens THAP domain containing 4 (THAP4), transcript variant X3, mRNA
>NM_001319135.1 Homo sapiens staufen double-stranded RNA binding protein 1 (STAU1), transcript variant T7, mRNA
>XM_017014384.1 PREDICTED: Homo sapiens lysophosphatidic acid receptor 1 (LPAR1), transcript variant X2, mRNA
>XM_017008367.1 PREDICTED: Homo sapiens transmembrane protein 144 (TMEM144), transcript variant X5, mRNA
>NM_001243328.1 Homo sapiens retinoic acid early transcript 1E (RAET1E), transcript variant 4, mRNA
>NM_001330480.1 Homo sapiens piwi like RNA-mediated gene silencing 2 (PIWIL2), transcript variant 3, mRNA
>NR_024577.2 Homo sapiens Sec61 translocon alpha 2 subunit (SEC61A2), transcript variant 4, non-coding RNA
>NR_033759.1 Homo sapiens ATP synthase membrane subunit g (ATP5MG), transcript variant 2, non-coding RNA
>NR_104486.1 Homo sapiens zinc finger DHHC-type containing 20 (ZDHHC20), transcript variant 3, non-coding RNA
>XM_011521371.1 PREDICTED: Homo sapiens FANCD2/FANCI-associated nuclease 1 (FAN1), transcript variant X8, mRNA
>XM_017008823.1 PREDICTED: Homo sapiens protein arginine methyltransferase 9 (PRMT9), transcript variant X1, mRNA
>XM_017022396.1 PREDICTED: Homo sapiens HAUS augmin like complex subunit 2 (HAUS2), transcript variant X1, mRNA
>NM_000016.5 Homo sapiens acyl-CoA dehydrogenase medium chain (ACADM), transcript variant 1, mRNA
>NM_000019.3 Homo sapiens acetyl-CoA acetyltransferase 1 (ACAT1), mRNA
>NM_000028.2 Homo sapiens amylo-alpha-1, 6-glucosidase, 4-alpha-glucanotransferase (AGL), transcript variant 4, mRNA
>NM_000046.4 Homo sapiens arylsulfatase B (ARSB), transcript variant 1, mRNA
>NM_000048.3 Homo sapiens argininosuccinate lyase (ASL), transcript variant 2, mRNA
>NM_000075.3 Homo sapiens cyclin dependent kinase 4 (CDK4), mRNA
>NM_000081.3 Homo sapiens lysosomal trafficking regulator (LYST), transcript variant 1, mRNA
>NM_000084.4 Homo sapiens chloride voltage-gated channel 5 (CLCN5), transcript variant 3, mRNA
>NM_000107.2 Homo sapiens damage specific DNA binding protein 2 (DDB2), transcript variant WT, mRNA
>NM_000113.2 Homo sapiens torsin family 1 member A (TOR1A), mRNA
>NM_000121.3 Homo sapiens erythropoietin receptor (EPOR), transcript variant 1, mRNA
>NM_000126.3 Homo sapiens electron transfer flavoprotein alpha subunit (ETFA), transcript variant 1, mRNA
>NM_000132.3 Homo sapiens coagulation factor VIII (F8), transcript variant 1, mRNA
>NM_000137.2 Homo sapiens fumarylacetoacetate hydrolase (FAH), mRNA
>NM_000140.3 Homo sapiens ferrochelatase (FECH), transcript variant 2, mRNA
>NM_000142.4 Homo sapiens fibroblast growth factor receptor 3 (FGFR3), transcript variant 1, mRNA
>NM_000144.4 Homo sapiens frataxin (FXN), transcript variant 1, mRNA
>NM_000157.3 Homo sapiens glucosylceramidase beta (GBA), transcript variant 1, mRNA
>NM_000175.4 Homo sapiens glucose-6-phosphate isomerase (GPI), transcript variant 2, mRNA
>NM_000183.2 Homo sapiens hydroxyacyl-CoA dehydrogenase trifunctional multienzyme complex subunit beta (HADHB), transcript variant 1, mRNA
In case DAVID needs the version numbers then the latest are in the list above.
Great thank you very very much! Much appreciated
Hi there, due to an error i've realised i actually have to do this for around 300 transcripts IDs. Im trying to use your command in linux but struggling to understand how you did this. Is nuccore a file you could share with me please?
nuccore
is the nucleotide database at NCBI thateutils
is searching. Put your query ID's in a file (call it any name you want, one per line, plain text). Then replace the file name I used (new
) in command above with your file name.Hi thanks for getting back to me. I'm having issues still. Here is the error with one ID (the same error occurs repeatedly when I include my query ID file in foor loop). It seems to want an index file because -i is the option for index. No idea what's going wrong.
efetch -db nuccore -id NM_021088.3 -format fasta Missing idxfile for option -i.
EFETCH - retrieve entries from sequence databases.
Synopsis: efetch -options [database:]<query>
Databases: SWissprot/SP, PIR, WOrmpep/WP, EMbl, GEnbank/GB, ProDom, ProSite
Options: -a Search with Accession number -f Fasta format output -q Sequence only output (one line) -s <#> Start at position # -e <#> Stop at position # -o More options and info...
Environment: SWDIR = SwissProt directory - database and EMBL index files PIRDIR = PIR -- " -- WORMDIR = Wormpep -- " -- EMBLDIR = EMBL -- " -- GBDIR = Genbank -- " -- PRODOMDIR = ProDom -- " -- PROSITEDIR = ProSite -- " -- DBDIR = User's own -- " -- (fasta format)
SEQDB database file (default SwissProt) SEQDBIDX index file DIVTABL division lookup table
Ex. setenv DBDIR /pubseq/seqlibs/embl/
Note that Prodom family consensus seqs can be fetched by PD:_#
by Erik Sonnhammer (esr@sanger.ac.uk) Version 2.1,
I have formatted all of my IDs into a file like this (one ID per line)
You are not using
efetch
from NCBI unix utils. Follow the directions here to download them.It appears that you have some other program called
efetch
from EBI-Sanger Center.I can do this fine with NCBI's efetch:
Got it working in the end.Thank you very much for all your help. This is contributing towards data analysis for my BSc dissertation. Would you like to give me your credentials so your help can be acknowledged in my dissertation?
Hi there, due to an error i've realised i actually have to do this for around 300 transcripts IDs. Im trying to use your command in linux but struggling to understand how you did this. Is nuccore a file you could share with me please?