Refseq accession number for protein and nucleotides
2
1
Entering edit mode
7.1 years ago
horsedog ▴ 60

Hi, I have a basic question about the accession number in reference sequence. For non-redundant protein is WP_ and for nucleotides is NZ_. Here I got thousands of protein sequence with different accession number. My goal is to find out if certain proteins are from the same genome, which means if some WP numbers share the same NZ_ number, is it possible to do this?

Thanks

NCBI • 3.6k views
ADD COMMENT
0
Entering edit mode

give us some examples...

ADD REPLY
0
Entering edit mode

So like for WP_069981055.1, which is protein electron transporter RnfB from Geosporobacter ferrireducens , but how can we know that where does it translated from? I just search the name Geosporobacter ferrireducens in gene database and it gives me two refseq number for genomes, one is NZ_CP017269.1 and the other one is NZ_CP017270.1, I don't know which one is the genome that translate this protein. And I also don't know if it's reasonable to use this non-redundant protein number to search for a genome because probably this is annotated on many different RefSeq genomes. So i don't know how to solve my problem.

ADD REPLY
1
Entering edit mode

This particular example (WP_069981055.1) is annotated from a single genome based on the examination of the record. Take a look at the NCBI help page for RefSeq non-redundant protein categories which describe how the entries will appear in the full records (single species, multi-species and multi-species (bacteria and archaea)).

ADD REPLY
0
Entering edit mode

This WP_069981055.1 protein is annotated to the protein of a single organism which is Geosporobacter ferrireducens. talking about which one of either two genes NZ_CP017269.1 or NZ_CP017270.1 translate your protein we don't now because based on current annotation knowledge for Geosporobacter ferrireducens we cannot decide yet which gene translate your protein. In can be that either one translates your protein or both. You have to empirically validate this using knock-out experimentation for example

ADD REPLY
2
Entering edit mode
6.8 years ago
tdmurphy ▴ 230

Try the IPG report:

esummary -db protein -id WP_069981055.1 -format ipg
Id  Source  Nucleotide Accession    Start   Stop    Strand  Protein Protein Name    Organism    Strain  Assembly
119361699   RefSeq  NZ_CP017269.1   5595878 5596861 +   WP_069981055.1  4Fe-4S dicluster domain-containing protein  Geosporobacter ferrireducens    IRF9    GCF_001750685.1
119361699   INSDC   CP017269.1  5595878 5596861 +   AOT72746.1  electron transporter RnfB   Geosporobacter ferrireducens    IRF9    GCA_001750685.1

That reports all assemblies that have a protein of exactly that sequence annotated - stick to RefSeq in column 2. Use the last column to identify which particular assembly the annotation is on. Keep in mind some assemblies have multiple sequences, so different NZ_ accession doesn't necessarily mean it's from a different assembly.

ADD COMMENT
1
Entering edit mode
7.1 years ago

Whenever you need to work with annotated genomes, I would suggest to work with Ensembl, Ensembl bacteria in your case.

ADD COMMENT

Login before adding your answer.

Traffic: 2789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6