NM_transcript ID gene identification
1
1
Entering edit mode
6.7 years ago
Abiterkuile ▴ 30

I have 76 differentially expressed Refseq human transcript IDs (mostly NM transcript IDs but some are also XM).

I wanted to identify the gene names related to the transcripts using DAVID however it doesn't seem to recognise NM or XM transcript IDs.

I used panther and it identified 36, however 39 were not identified (list below)

XM_017006279 XM_017008472 NM_021088 XM_005268372 XM_006716584 XM_011517885 XM_017019593 XM_017004256 NM_001319135 XM_017014384 XM_017008367 NM_001243328 NM_001330480 NR_024577 NR_033759 NR_104486 XM_011521371 XM_017008823 XM_017022396 NM_000016 NM_000019 NM_000028 NM_000046 NM_000048 NM_000075 NM_000081 NM_000084 NM_000107 NM_000113 NM_000121 NM_000126 NM_000132 NM_000137 NM_000140 NM_000142 NM_000144 NM_000157 NM_000175 NM_000183

Whart would be the most suitable way to identify the remaining transcripts? Does anyone have any other databases they would recomend?

Should I try to convert the NM transcript IDs to a format that's recognised in DAVID ?

Thanks

rna-seq gene ontology panther refseq DAVID • 5.5k views
ADD COMMENT
1
Entering edit mode
6.7 years ago
GenoMax 147k

Looks like NCBI eutils is able to find/map all of them. I put your identifiers in a file called new (one ID per line).

for i in `cat ./new`; do efetch -db nuccore -id $i -format fasta | grep ">"; done
>XM_017006279.1 PREDICTED: Homo sapiens eukaryotic translation initiation factor 4E family member 3 (EIF4E3), transcript variant X3, mRNA
>XM_017008472.1 PREDICTED: Homo sapiens ATPase phospholipid transporting 10D (putative) (ATP10D), transcript variant X6, mRNA
>NM_021088.3 Homo sapiens zinc finger protein 2 (ZNF2), transcript variant 1, mRNA
>XM_005268372.4 PREDICTED: Homo sapiens PPARG coactivator 1 beta (PPARGC1B), transcript variant X4, mRNA
>XM_006716584.1 PREDICTED: Homo sapiens ectonucleotide pyrophosphatase/phosphodiesterase 2 (ENPP2), transcript variant X1, mRNA
>XM_011517885.2 PREDICTED: Homo sapiens ankyrin repeat domain 18B (ANKRD18B), transcript variant X3, mRNA
>XM_017019593.1 PREDICTED: Homo sapiens vezatin, adherens junctions transmembrane protein (VEZT), transcript variant X23, mRNA
>XM_017004256.1 PREDICTED: Homo sapiens THAP domain containing 4 (THAP4), transcript variant X3, mRNA
>NM_001319135.1 Homo sapiens staufen double-stranded RNA binding protein 1 (STAU1), transcript variant T7, mRNA
>XM_017014384.1 PREDICTED: Homo sapiens lysophosphatidic acid receptor 1 (LPAR1), transcript variant X2, mRNA
>XM_017008367.1 PREDICTED: Homo sapiens transmembrane protein 144 (TMEM144), transcript variant X5, mRNA
>NM_001243328.1 Homo sapiens retinoic acid early transcript 1E (RAET1E), transcript variant 4, mRNA
>NM_001330480.1 Homo sapiens piwi like RNA-mediated gene silencing 2 (PIWIL2), transcript variant 3, mRNA
>NR_024577.2 Homo sapiens Sec61 translocon alpha 2 subunit (SEC61A2), transcript variant 4, non-coding RNA
>NR_033759.1 Homo sapiens ATP synthase membrane subunit g (ATP5MG), transcript variant 2, non-coding RNA
>NR_104486.1 Homo sapiens zinc finger DHHC-type containing 20 (ZDHHC20), transcript variant 3, non-coding RNA
>XM_011521371.1 PREDICTED: Homo sapiens FANCD2/FANCI-associated nuclease 1 (FAN1), transcript variant X8, mRNA
>XM_017008823.1 PREDICTED: Homo sapiens protein arginine methyltransferase 9 (PRMT9), transcript variant X1, mRNA
>XM_017022396.1 PREDICTED: Homo sapiens HAUS augmin like complex subunit 2 (HAUS2), transcript variant X1, mRNA
>NM_000016.5 Homo sapiens acyl-CoA dehydrogenase medium chain (ACADM), transcript variant 1, mRNA
>NM_000019.3 Homo sapiens acetyl-CoA acetyltransferase 1 (ACAT1), mRNA
>NM_000028.2 Homo sapiens amylo-alpha-1, 6-glucosidase, 4-alpha-glucanotransferase (AGL), transcript variant 4, mRNA
>NM_000046.4 Homo sapiens arylsulfatase B (ARSB), transcript variant 1, mRNA
>NM_000048.3 Homo sapiens argininosuccinate lyase (ASL), transcript variant 2, mRNA
>NM_000075.3 Homo sapiens cyclin dependent kinase 4 (CDK4), mRNA
>NM_000081.3 Homo sapiens lysosomal trafficking regulator (LYST), transcript variant 1, mRNA
>NM_000084.4 Homo sapiens chloride voltage-gated channel 5 (CLCN5), transcript variant 3, mRNA
>NM_000107.2 Homo sapiens damage specific DNA binding protein 2 (DDB2), transcript variant WT, mRNA
>NM_000113.2 Homo sapiens torsin family 1 member A (TOR1A), mRNA
>NM_000121.3 Homo sapiens erythropoietin receptor (EPOR), transcript variant 1, mRNA
>NM_000126.3 Homo sapiens electron transfer flavoprotein alpha subunit (ETFA), transcript variant 1, mRNA
>NM_000132.3 Homo sapiens coagulation factor VIII (F8), transcript variant 1, mRNA
>NM_000137.2 Homo sapiens fumarylacetoacetate hydrolase (FAH), mRNA
>NM_000140.3 Homo sapiens ferrochelatase (FECH), transcript variant 2, mRNA
>NM_000142.4 Homo sapiens fibroblast growth factor receptor 3 (FGFR3), transcript variant 1, mRNA
>NM_000144.4 Homo sapiens frataxin (FXN), transcript variant 1, mRNA
>NM_000157.3 Homo sapiens glucosylceramidase beta (GBA), transcript variant 1, mRNA
>NM_000175.4 Homo sapiens glucose-6-phosphate isomerase (GPI), transcript variant 2, mRNA
>NM_000183.2 Homo sapiens hydroxyacyl-CoA dehydrogenase trifunctional multienzyme complex subunit beta (HADHB), transcript variant 1, mRNA

In case DAVID needs the version numbers then the latest are in the list above.

ADD COMMENT
0
Entering edit mode

Great thank you very very much! Much appreciated

ADD REPLY
0
Entering edit mode

Hi there, due to an error i've realised i actually have to do this for around 300 transcripts IDs. Im trying to use your command in linux but struggling to understand how you did this. Is nuccore a file you could share with me please?

ADD REPLY
0
Entering edit mode

nuccore is the nucleotide database at NCBI that eutils is searching. Put your query ID's in a file (call it any name you want, one per line, plain text). Then replace the file name I used (new) in command above with your file name.

ADD REPLY
0
Entering edit mode

Hi thanks for getting back to me. I'm having issues still. Here is the error with one ID (the same error occurs repeatedly when I include my query ID file in foor loop). It seems to want an index file because -i is the option for index. No idea what's going wrong.

efetch -db nuccore -id NM_021088.3 -format fasta Missing idxfile for option -i.

EFETCH - retrieve entries from sequence databases.

Synopsis: efetch -options [database:]<query>

Databases: SWissprot/SP, PIR, WOrmpep/WP, EMbl, GEnbank/GB, ProDom, ProSite

Options: -a Search with Accession number -f Fasta format output -q Sequence only output (one line) -s <#> Start at position # -e <#> Stop at position # -o More options and info...

-D <dir>      Specify database directory
-H            Display index header data
-p            Display entrynames in search path
-r            Print sequence in 'raw' format
-m            Fetch from mixed mini database
-M            Mini format output
-b            Do NOT reverse the order of bytes
                          (SunOS, IRIX do reverse, Alpha not)
-d <dbfile>   Specify database file (avoid this)
-i <idxfile>  Specify index file (avoid this)
-l <divfile>  Specify division lookup table (avoid this)
-B <database> Specify database (archaic)
-A            Only return entryname for accession number
-n <name>     Give the sequence this name
-x            Don't require query to match entry's name exactly (avoid)
-w            For Wormpep: also fetch cross-referenced SwissProt entry
-h            shows this help text

Environment: SWDIR = SwissProt directory - database and EMBL index files PIRDIR = PIR -- " -- WORMDIR = Wormpep -- " -- EMBLDIR = EMBL -- " -- GBDIR = Genbank -- " -- PRODOMDIR = ProDom -- " -- PROSITEDIR = ProSite -- " -- DBDIR = User's own -- " -- (fasta format)

SEQDB database file (default SwissProt) SEQDBIDX index file DIVTABL division lookup table

Ex. setenv DBDIR /pubseq/seqlibs/embl/

Note that Prodom family consensus seqs can be fetched by PD:_#

by Erik Sonnhammer (esr@sanger.ac.uk) Version 2.1,

ADD REPLY
0
Entering edit mode

I have formatted all of my IDs into a file like this (one ID per line)

XM_017006279.1

ADD REPLY
0
Entering edit mode

You are not using efetch from NCBI unix utils. Follow the directions here to download them.

It appears that you have some other program called efetch from EBI-Sanger Center.

I can do this fine with NCBI's efetch:

efetch -db nuccore -id NM_021088.3 -format fasta | grep ">"
>NM_021088.3 Homo sapiens zinc finger protein 2 (ZNF2), transcript variant 1, mRNA
ADD REPLY
0
Entering edit mode

Got it working in the end.Thank you very much for all your help. This is contributing towards data analysis for my BSc dissertation. Would you like to give me your credentials so your help can be acknowledged in my dissertation?

ADD REPLY
0
Entering edit mode

Hi there, due to an error i've realised i actually have to do this for around 300 transcripts IDs. Im trying to use your command in linux but struggling to understand how you did this. Is nuccore a file you could share with me please?

ADD REPLY

Login before adding your answer.

Traffic: 1126 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6