Hi all,
I've done all of the exercises on CodeAcademy for Python, and I just downloaded Anaconda. If anyone can just point me in the right direction to do this, I'd be very grateful.
(1) I have all of these Uniprot IDS. http://www.genome.jp/dbget-bin/get_linkdb?-t+9+ko:K02405
(2) If you click on any of them, you'll get something like this: http://www.genome.jp/dbget-bin/www_bget?uniprot:A0A023NVK1
(3) On that page, it has identifiers: OS Dyella jiangningensis. OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; OC Rhodanobacteraceae; Dyella.
and at the bottom it has the sequences and its characteristics.
(4) FT REGION 18 90 Sigma-70 factor domain-2. FT {ECO:0000256|HAMAP-Rule:MF_00962}. FT REGION 98 170 Sigma-70 factor domain-3. FT {ECO:0000256|HAMAP-Rule:MF_00962}. FT REGION 186 234 Sigma-70 factor domain-4. FT {ECO:0000256|HAMAP-Rule:MF_00962}. FT MOTIF 45 48 Interaction with polymerase core subunit FT RpoC. {ECO:0000256|HAMAP-Rule:MF_00962}. SQ SEQUENCE 254 AA; 28015 MW; F3BD706CB822684E CRC64; MSVASEYLQL QRQSADELVR QHAPLVRRIA YHLMGRLPPS VDVSDLIQAG MIGLLEAARN FATGRNASFE TFAGIRIRGA MLDELRRTDW TPRSVHRKVR EMAEVVRQIE IETGADADDA EVMRRLGIGA EEYHQVLADA ASARLLSLSA PDDADGGAAF DVADGDSLGP QDSVEHEGMR EALVEAIGSL PEREQLVMSL YYEEELNLKE IGAVLGVTES RVCQIHGQAV VRLRARMSGW HDAVEQSQKQ KKKG
The lines that say "Sigma-70 factor domain-2" and "Sigma-70 factor domain-4" specify the amino acid sequences these domains correspond to, in these cases 18-90 and 186-234, respectively. The sequence it corresponds to is at the bottom, starting from "MSVASE....".
What I want to do is to take all the Uniprot IDS (1) and for each UniProt ID(2), to get the identifiers (3) for the Amino Acid sequences specified (4).
So in the above case, it would use the Uniprot ID to spit out the following information:
Uniprot ID: A0A023NVK1 Species: Dyella jiangningensis. Taxonomy: Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; Rhodanobacteraceae; Dyella.
Sequence: VR QHAPLVRRIA YHLMGRLPPS VDVSDLIQAG MIGLLEAARN FATGRNASFE TFAGIRIRGA MLDELRRTDW (and) IGSL PEREQLVMSL YYEEELNLKE IGAVLGVTES RVCQIHGQAV VRLR
So how do I get started?