Hi I have a set of the genes (mouse) for which i would like to get the function, tissue specificity and other features. I have obtained a file (.dat) from uniprot which has all such information. but it also has unwanted information. The data seems to be in genbank format. how can i get the features of my interest. when i searched i got this site http://www.molbiol-tools.ca/Convert.htm and tried with gbk2ffn tool but it is showing some error. Not sure whether there is any other way to do the same. This is how my file appears
ID 1433B_MOUSE Reviewed; 246 AA.
AC Q9CQV8; O70455; Q3TY33; Q3UAN6;
DT 26-SEP-2001, integrated into UniProtKB/Swiss-Prot.
DT 23-JAN-2007, sequence version 3.
DT 24-JUL-2013, entry version 118.
DE RecName: Full=14-3-3 protein beta/alpha;
DE AltName: Full=Protein kinase C inhibitor protein 1;
DE Short=KCIP-1;
DE Contains:
DE RecName: Full=14-3-3 protein beta/alpha, N-terminally processed;
GN Name=Ywhab;
OS Mus musculus (Mouse).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi;
OC Muroidea; Muridae; Murinae; Mus; Mus.
OX NCBI_TaxID=10090;
RN [1]
RP NUCLEOTIDE SEQUENCE [MRNA].
RC STRAIN=C57BL/6J;
RA Karpitskiy V.V., Shaw A.S.;
RL Submitted (APR-1998) to the EMBL/GenBank/DDBJ databases.
RN [2]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC STRAIN=C57BL/6J;
RC TISSUE=Bone marrow, Embryo, Kidney, Liver, Thymus, and Visual cortex;
RX PubMed=16141072; DOI=10.1126/science.1112014;
RA Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N.,
RA Oyama R., Ravasi T., Lenhard B., Wells C., Kodzius R., Shimokawa K.,
RA Bajic V.B., Brenner S.E., Batalov S., Forrest A.R., Zavolan M.,
RA Davis M.J., Wilming L.G., Aidinis V., Allen J.E.,
RA Ambesi-Impiombato A., Apweiler R., Aturaliya R.N., Bailey T.L.,
RA Bansal M., Baxter L., Beisel K.W., Bersano T., Bono H., Chalk A.M.,
RA Chiu K.P., Choudhary V., Christoffels A., Clutterbuck D.R.,
RA Crowe M.L., Dalla E., Dalrymple B.P., de Bono B., Della Gatta G.,
RA di Bernardo D., Down T., Engstrom P., Fagiolini M., Faulkner G.,
RA Fletcher C.F., Fukushima T., Furuno M., Futaki S., Gariboldi M.,
RA Georgii-Hemming P., Gingeras T.R., Gojobori T., Green R.E.,
RA Gustincich S., Harbers M., Hayashi Y., Hensch T.K., Hirokawa N.,
RA Hill D., Huminiecki L., Iacono M., Ikeo K., Iwama A., Ishikawa T.,
RA Jakt M., Kanapin A., Katoh M., Kawasawa Y., Kelso J., Kitamura H.,
RA Kitano H., Kollias G., Krishnan S.P., Kruger A., Kummerfeld S.K.,
RA Kurochkin I.V., Lareau L.F., Lazarevic D., Lipovich L., Liu J.,
RA Liuni S., McWilliam S., Madan Babu M., Madera M., Marchionni L.,
RA Matsuda H., Matsuzawa S., Miki H., Mignone F., Miyake S., Morris K.,
RA Mottagui-Tabar S., Mulder N., Nakano N., Nakauchi H., Ng P.,
RA Nilsson R., Nishiguchi S., Nishikawa S., Nori F., Ohara O.,
RA Okazaki Y., Orlando V., Pang K.C., Pavan W.J., Pavesi G., Pesole G.,
RA Petrovsky N., Piazza S., Reed J., Reid J.F., Ring B.Z., Ringwald M.,
RA Rost B., Ruan Y., Salzberg S.L., Sandelin A., Schneider C.,
RA Schoenbach C., Sekiguchi K., Semple C.A., Seno S., Sessa L., Sheng Y.,
RA Shibata Y., Shimada H., Shimada K., Silva D., Sinclair B.,
RA Sperling S., Stupka E., Sugiura K., Sultana R., Takenaka Y., Taki K.,
RA Tammoja K., Tan S.L., Tang S., Taylor M.S., Tegner J., Teichmann S.A.,
RA Ueda H.R., van Nimwegen E., Verardo R., Wei C.L., Yagi K.,
RA Yamanishi H., Zabarovsky E., Zhu S., Zimmer A., Hide W., Bult C.,
RA Grimmond S.M., Teasdale R.D., Liu E.T., Brusic V., Quackenbush J.,
RA Wahlestedt C., Mattick J.S., Hume D.A., Kai C., Sasaki D., Tomaru Y.,
RA Fukuda S., Kanamori-Katayama M., Suzuki M., Aoki J., Arakawa T.,
RA Iida J., Imamura K., Itoh M., Kato T., Kawaji H., Kawagashira N.,
RA Kawashima T., Kojima M., Kondo S., Konno H., Nakano K., Ninomiya N.,
RA Nishio T., Okada M., Plessy C., Shibata K., Shiraki T., Suzuki S.,
RA Tagami M., Waki K., Watahiki A., Okamura-Oho Y., Suzuki H., Kawai J.,
RA Hayashizaki Y.;
RT "The transcriptional landscape of the mammalian genome.";
RL Science 309:1559-1563(2005).
RN [3]
RP PROTEIN SEQUENCE OF 1-12; 14-57; 61-70; 84-117; 128-169; 196-246 AND
RP 215-224, AND MASS SPECTROMETRY.
RC STRAIN=C57BL/6, and OF1; TISSUE=Brain, and Hippocampus;
RA Lubec G., Kang S.U., Sunyer B., Chen W.-Q.;
RL Submitted (JAN-2009) to UniProtKB.
RN [4]
RP PHOSPHORYLATION AT SER-60.
RX PubMed=9705322; DOI=10.1074/jbc.273.34.21834;
RA Megidish T., Cooper J., Zhang L., Fu H., Hakomori S.;
RT "A novel sphingosine-dependent protein kinase (SDK1) specifically
RT phosphorylates certain isoforms of 14-3-3 protein.";
RL J. Biol. Chem. 273:21834-21845(1998).
RN [5]
RP NITRATION [LARGE SCALE ANALYSIS] AT TYR-84 AND TYR-106, AND MASS
RP SPECTROMETRY.
RC TISSUE=Brain;
RX PubMed=16800626; DOI=10.1021/bi060474w;
RA Sacksteder C.A., Qian W.-J., Knyushko T.V., Wang H., Chin M.H.,
RA Lacan G., Melega W.P., Camp D.G. II, Smith R.D., Smith D.J.,
RA Squier T.C., Bigelow D.J.;
RT "Endogenously nitrated proteins in mouse brain: links to
RT neurodegenerative disease.";
RL Biochemistry 45:8009-8022(2006).
RN [6]
RP INTERACTION WITH PRKCE.
RX PubMed=18604201; DOI=10.1038/ncb1749;
RA Saurin A.T., Durgan J., Cameron A.J., Faisal A., Marber M.S.,
RA Parker P.J.;
RT "The regulated assembly of a PKCepsilon complex controls the
RT completion of cytokinesis.";
RL Nat. Cell Biol. 10:891-901(2008).
RN [7]
RP INTERACTION WITH SAMSN1.
RX PubMed=20478393; DOI=10.1016/j.biocel.2010.05.004;
RA Brandt S., Ellwanger K., Beuter-Gunia C., Schuster M., Hausser A.,
RA Schmitz I., Beer-Hammer S.;
RT "SLy2 targets the nuclear SAP30/HDAC1 complex.";
RL Int. J. Biochem. Cell Biol. 42:1472-1481(2010).
CC -!- FUNCTION: Adapter protein implicated in the regulation of a large
CC spectrum of both general and specialized signaling pathways. Binds
CC to a large number of partners, usually by recognition of a
CC phosphoserine or phosphothreonine motif. Binding generally results
CC in the modulation of the activity of the binding partner. Negative
CC regulator of osteogenesis. Blocks the nuclear translocation of the
CC phosphorylated form (by AKT1) of SRPK2 and antagonizes its
CC stimulatory effect on cyclin D1 expression resulting in blockage
CC of neuronal apoptosis elicited by SRPK2 (By similarity).
CC -!- SUBUNIT: Homodimer, and heterodimer with YWHAG, YWHAE and YWHAQ.
CC Interacts with SSH1 and TORC2/CRTC2. Interacts with GAB2 and YAP1
CC (phosphorylated form) (By similarity). Interacts with SAMSN1.
CC Interacts with PKA-phosphorylated AANAT (By similarity). Interacts
CC with the phosphorylated (by AKT1) form of SRPK2 (By similarity).
CC Interacts with PRKCE (phosphorylated form).
CC -!- INTERACTION:
CC Q5S006:Lrrk2; NbExp=3; IntAct=EBI-771608, EBI-2693710;
CC -!- SUBCELLULAR LOCATION: Cytoplasm (By similarity). Melanosome (By
CC similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative initiation; Named isoforms=2;
CC Name=Long;
CC IsoId=Q9CQV8-1; Sequence=Displayed;
CC Name=Short;
CC IsoId=Q9CQV8-2; Sequence=VSP_018634;
CC Note=No experimental confirmation available. Contains a
CC N-acetylmethionine at position 1 (By similarity);
CC -!- PTM: Isoform alpha differs from isoform beta in being
CC phosphorylated (By similarity). Phosphorylated on Ser-60 by
CC protein kinase C delta type catalytic subunit in a sphingosine-
CC dependent fashion.
CC -!- PTM: Isoform Short contains a N-acetylmethionine at position 1 (By
CC similarity).
CC -!- SIMILARITY: Belongs to the 14-3-3 family.
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; AF058797; AAC14343.1; -; mRNA.
DR EMBL; AK002632; BAB22246.1; -; mRNA.
DR EMBL; AK004872; BAB23631.1; -; mRNA.
DR EMBL; AK011389; BAB27587.1; -; mRNA.
DR EMBL; AK083367; BAC38886.1; -; mRNA.
DR EMBL; AK144061; BAE25678.1; -; mRNA.
DR EMBL; AK150414; BAE29538.1; -; mRNA.
DR EMBL; AK151294; BAE30278.1; -; mRNA.
DR EMBL; AK158932; BAE34730.1; -; mRNA.
DR IPI; IPI00230682; -.
DR IPI; IPI00760000; -.
DR RefSeq; NP_061223.2; NM_018753.6.
DR UniGene; Mm.34319; -.
DR PDB; 4GNT; X-ray; 2.41 A; A=1-239.
DR PDBsum; 4GNT; -.
DR ProteinModelPortal; Q9CQV8; -.
DR SMR; Q9CQV8; 2-232.
DR IntAct; Q9CQV8; 612.
DR MINT; MINT-1869492; -.
DR PhosphoSite; Q9CQV8; -.
DR UCD-2DPAGE; Q9CQV8; -.
DR PaxDb; Q9CQV8; -.
DR PRIDE; Q9CQV8; -.
DR Ensembl; ENSMUST00000018470; ENSMUSP00000018470; ENSMUSG00000018326.
DR GeneID; 54401; -.
DR KEGG; mmu:54401; -.
DR UCSC; uc008ntp.1; mouse.
DR CTD; 7529; -.
DR MGI; MGI:1891917; Ywhab.
DR eggNOG; COG5040; -.
DR GeneTree; ENSGT00710000106445; -.
DR HOGENOM; HOG000240379; -.
DR HOVERGEN; HBG050423; -.
DR InParanoid; Q9CQV8; -.
DR KO; K16197; -.
DR OMA; CNDVLXT; -.
DR OrthoDB; EOG4N30PR; -.
DR Reactome; REACT_147847; Translocation of Glut4 to the Plasma Membrane.
DR ChiTaRS; YWHAB; mouse.
DR NextBio; 311260; -.
DR ArrayExpress; Q9CQV8; -.
DR Bgee; Q9CQV8; -.
DR Genevestigator; Q9CQV8; -.
DR GO; GO:0030659; C:cytoplasmic vesicle membrane; TAS:Reactome.
DR GO; GO:0005829; C:cytosol; TAS:Reactome.
DR GO; GO:0042470; C:melanosome; IEA:UniProtKB-SubCell.
DR GO; GO:0048471; C:perinuclear region of cytoplasm; IEA:Compara.
DR GO; GO:0017053; C:transcriptional repressor complex; IEA:Compara.
DR GO; GO:0019904; F:protein domain specific binding; IDA:MGI.
DR GO; GO:0003714; F:transcription corepressor activity; IEA:Compara.
DR GO; GO:0051220; P:cytoplasmic sequestering of protein; IEA:Compara.
DR GO; GO:0035308; P:negative regulation of protein dephosphorylation; IEA:Compara.
DR GO; GO:0045892; P:negative regulation of transcription, DNA-dependent; IEA:Compara.
DR GO; GO:0043085; P:positive regulation of catalytic activity; IEA:Compara.
DR GO; GO:0051291; P:protein heterooligomerization; IEA:Compara.
DR GO; GO:0006605; P:protein targeting; IDA:MGI.
DR Gene3D; 1.20.190.20; -; 1.
DR InterPro; IPR000308; 14-3-3.
DR InterPro; IPR023409; 14-3-3_CS.
DR InterPro; IPR023410; 14-3-3_domain.
DR PANTHER; PTHR18860; PTHR18860; 1.
DR Pfam; PF00244; 14-3-3; 1.
DR PIRSF; PIRSF000868; 14-3-3; 1.
DR PRINTS; PR00305; 1433ZETA.
DR SMART; SM00101; 14_3_3; 1.
DR SUPFAM; SSF48445; 14-3-3; 1.
DR PROSITE; PS00796; 1433_1; 1.
DR PROSITE; PS00797; 1433_2; 1.
PE 1: Evidence at protein level;
KW 3D-structure; Acetylation; Alternative initiation; Complete proteome;
KW Cytoplasm; Direct protein sequencing; Nitration; Phosphoprotein;
KW Reference proteome.
FT CHAIN 1 246 14-3-3 protein beta/alpha.
FT /FTId=PRO_0000367902.
FT INIT_MET 1 1 Removed; alternate (By similarity).
FT CHAIN 2 246 14-3-3 protein beta/alpha, N-terminally
FT processed.
FT /FTId=PRO_0000000005.
FT SITE 58 58 Interaction with phosphoserine on
FT interacting protein (By similarity).
FT SITE 129 129 Interaction with phosphoserine on
FT interacting protein (By similarity).
FT MOD_RES 1 1 N-acetylmethionine (By similarity).
FT MOD_RES 2 2 N-acetylthreonine; in 14-3-3 protein
FT beta/alpha, N-terminally processed (By
FT similarity).
FT MOD_RES 60 60 Phosphoserine.
FT MOD_RES 70 70 N6-acetyllysine (By similarity).
FT MOD_RES 84 84 Nitrated tyrosine.
FT MOD_RES 106 106 Nitrated tyrosine.
FT MOD_RES 117 117 N6-acetyllysine (By similarity).
FT MOD_RES 186 186 Phosphoserine (By similarity).
FT VAR_SEQ 1 2 Missing (in isoform Short).
FT /FTId=VSP_018634.
FT CONFLICT 10 10 Q -> H (in Ref. 1; AAC14343).
FT CONFLICT 74 74 N -> D (in Ref. 1; AAC14343).
FT CONFLICT 126 126 D -> Y (in Ref. 2; BAE29538/BAE30278).
FT HELIX 5 17
FT HELIX 21 33
FT HELIX 40 68
FT HELIX 75 105
FT HELIX 107 110
FT HELIX 114 134
FT HELIX 137 161
FT HELIX 167 182
FT HELIX 187 203
FT HELIX 205 207
FT TURN 210 212
FT HELIX 213 230
SQ SEQUENCE 246 AA; 28086 MW; 51C366ED85B38EED CRC64;
MTMDKSELVQ KAKLAEQAER YDDMAAAMKA VTEQGHELSN EERNLLSVAY KNVVGARRSS
WRVISSIEQK TERNEKKQQM GKEYREKIEA ELQDICNDVL ELLDKYLILN ATQAESKVFY
LKMKGDYFRY LSEVASGENK QTTVSNSQQA YQEAFEISKK EMQPTHPIRL GLALNFSVFY
YEILNSPEKA CSLAKTAFDE AIAELDTLNE ESYKDSTLIM QLLRDNLTLW TSENQGDEGD
AGEGEN
//
There are many such entries in the single file. I want to get only specific features for a set of genes seperately
provide what specific features are you looking for?
Try genbank parsers: http://oreilly.com/catalog/begperlbio/chapter/ch10.html
It's not a GenBank file; it's from UniProt.
i didnot paid much attention to the file: nice catch