Getting the nucleotide sequence of proteins, having their UniProt IDs
1
1
Entering edit mode
8 months ago
Mariana ▴ 50

Hello!

I have a database of proteins, where I have various information for each protein, including its UniProt ids, protein sequence, and more. I'll be working on metagenomics and running my sequences against the sequences of some samples, and for that I will need the nucleotide sequence for my proteins.

I searched for ways I can get the nucleotide sequences, but I didn't find any direct way. I don't have the GenBank ids for all proteins, which is also an impediment. I would like to get the sequences in a programmatic way.

Does anyone have suggestions on how I could do this task?

Thank you so much indvance!

Nucleotide Genomic Sequence Uniprot Proteins • 565 views
ADD COMMENT
1
Entering edit mode

From : https://www.uniprot.org/help/canonical_nucleotide/1000

How do I get the nucleotide sequence that corresponds to the canonical UniProtKB sequence?

You cannot! Although more than 95% of the known protein sequences derive from DNA translation, there is no single nucleic acid reference sequence for a given UniProtKB/Swiss-Prot protein sequence.

ADD REPLY
1
Entering edit mode
8 months ago

using a xslt stylesheet.

<?xml version='1.0'  encoding="ISO-8859-1"?>
<xsl:stylesheet
    xmlns:u="http://uniprot.org/uniprot"
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0' 
    > 
<xsl:output method="text" encoding="ISO-8859-1" indent="yes" />

<xsl:template match="/">
<xsl:apply-templates select="u:uniprot"/>
</xsl:template>


<xsl:template match="u:uniprot">
<GBSet>
<xsl:apply-templates select="u:entry"/>
</GBSet>
</xsl:template>

<xsl:template match="u:entry">
<xsl:apply-templates select="u:dbReference[@type='RefSeq']"/>
</xsl:template>

<xsl:template match="u:dbReference">
<xsl:apply-templates select="u:property[@type='nucleotide sequence ID' and @value]"/>
</xsl:template>

<xsl:template match="u:property">wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=<xsl:value-of select="@value"/>&rettype=fasta"
</xsl:template>

</xsl:stylesheet>

example:

wget -O - https://www.uniprot.org/uniprotkb/Q99697.xml | xsltproc tansform.xsl - | bash

>NM_000325.5 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 3, mRNA
GTTAGGCCAACAGGGAAGCGCGGAGCCGCAGATCTGGTCCGTCGCTCGCCTGGGTGCCTGGAGCTGAGCT
GCGGCAAGGCCCGGCTCCTGTTCGACCGCCCGAGGGGTGTGCGTGTGCGCGTTGCGGAGGGTGCGCTCAG
AGGGCCGCGTCGTGGCTGCAGCGGCTGCTGCCGCCGCAGGGGATCTAATATCACCTACCTGTCCCTGTCA
--
>NM_001204397.1 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 4, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGGAGCA
--
>NM_001204398.1 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 5, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGATAAC
--
>NM_001204399.1 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 6, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGATAAC
--
>NM_153426.2 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 2, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGGAGCA
--
>NM_153427.2 Homo sapiens paired like homeodomain 2 (PITX2), transcript variant 1, mRNA
GCAGTCTGTGTAAGTTTTCATATCTCTGAGTGTGTGCACACAGTGGAGAGGGTGGAGCCTGCCATCCTCA
AATCTGAAAAGATTGAGAGATTTCAGAGGGCCCAGATGTGCCAAAGGTCAGAGGGATCAATATACAGGCC
CTACCACGGAAAGGCGGGGAAAAGGTTCGAATAGAAAACTGCTGCAGAAGGGAAGCCACTGAGAGGAGCA
--
>XM_011532027.2 PREDICTED: Homo sapiens paired like homeodomain 2 (PITX2), transcript variant X1, mRNA
CAGAAAATCAGGGTTCAGAAGTAAGGCACACTTTTCGAGTGAGAATATGCCCTGTAATTTCACATACTCT
TTGCTTTGCAGGAGCAAATGTGGACTTGAGGGAAACTCTCTCCCCCACCCCCACTTCTATCCCGTGCAAT
TTAATACCATCCTCGCCAGGAACCTTAACCTCGTCATTTTAAAAAATGAGATATCCGTGACCCAGGGTGA
ADD COMMENT

Login before adding your answer.

Traffic: 1460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6