Getting the nucleotide sequence of proteins, having their UniProt IDs
1
Hello!
I have a database of proteins, where I have various information for each protein, including its UniProt ids, protein sequence, and more. I'll be working on metagenomics and running my sequences against the sequences of some samples, and for that I will need the nucleotide sequence for my proteins.
I searched for ways I can get the nucleotide sequences, but I didn't find any direct way. I don't have the GenBank ids for all proteins, which is also an impediment. I would like to get the sequences in a programmatic way.
Does anyone have suggestions on how I could do this task?
Thank you so much indvance!
Nucleotide
Genomic
Sequence
Uniprot
Proteins
• 565 views
using a xslt stylesheet.
<?xml version='1.0' encoding="ISO-8859-1"?>
<xsl:stylesheet
xmlns:u="http://uniprot.org/uniprot"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'
>
<xsl:output method="text" encoding="ISO-8859-1" indent="yes" />
<xsl:template match="/">
<xsl:apply-templates select="u:uniprot"/>
</xsl:template>
<xsl:template match="u:uniprot">
<GBSet>
<xsl:apply-templates select="u:entry"/>
</GBSet>
</xsl:template>
<xsl:template match="u:entry">
<xsl:apply-templates select="u:dbReference[@type='RefSeq']"/>
</xsl:template>
<xsl:template match="u:dbReference">
<xsl:apply-templates select="u:property[@type='nucleotide sequence ID' and @value]"/>
</xsl:template>
<xsl:template match="u:property">wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=<xsl:value-of select="@value"/>&rettype=fasta"
</xsl:template>
</xsl:stylesheet>
example:
wget -O - https://www.uniprot.org/uniprotkb/Q99697.xml | xsltproc tansform.xsl - | bash
>NM_000325.5 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 3, mRNA
GTTAGGCCAACAGGGAAGCGCGGAGCCGCAGATCTGGTCCGTCGCTCGCCTGGGTGCCTGGAGCTGAGCT
GCGGCAAGGCCCGGCTCCTGTTCGACCGCCCGAGGGGTGTGCGTGTGCGCGTTGCGGAGGGTGCGCTCAG
AGGGCCGCGTCGTGGCTGCAGCGGCTGCTGCCGCCGCAGGGGATCTAATATCACCTACCTGTCCCTGTCA
--
>NM_001204397.1 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 4, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGGAGCA
--
>NM_001204398.1 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 5, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGATAAC
--
>NM_001204399.1 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 6, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGATAAC
--
>NM_153426.2 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 2, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGGAGCA
--
>NM_153427.2 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 1, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGGAGCA
--
>XM_011532027.2 PREDICTED: Homo sapiens paired like homeodomain 2 (PITX2), transcript variant X1, mRNA
CAGAAAATCAGGGTTCAGAAGTAAGGCACACTTTTCGAGTGAGAATATGCCCTGTAATTTCACATACTCT
TTGCTTTGCAGGAGCAAATGTGGACTTGAGGGAAACTCTCTCCCCCACCCCCACTTCTATCCCGTGCAAT
TTAATACCATCCTCGCCAGGAACCTTAACCTCGTCATTTTAAAAAATGAGATATCCGTGACCCAGGGTGA
Login before adding your answer.
Traffic: 1460 users visited in the last hour
From : https://www.uniprot.org/help/canonical_nucleotide/1000