RefSeq IDs question
0
0
Entering edit mode
6.2 years ago
n,n ▴ 370

Hello, I feel like I'm missing a simple concept or something regarding this:

I'm working with NCBI feature table files from bacterial genomes. In this type of tab-delimited text file column 11 and 12 are product_accession and non-redundant_refseq respectively. Most of the lines of the file contain IDs strings for both of these columns (Example: WP_000831330.1 & WP_000831330.1), however some lines do not contain any information for these columns even when they have a product name (protein name) assigned to them.

My question is precisely why some of the proteins annotated do not have any of these IDs assigned to them? shouldn't all of them in theory have both IDs?

sequence • 927 views
ADD COMMENT
1
Entering edit mode

At a first guess, I would imagine it’s because not everything that’s in NCBI is in the RefSeq database, as it’s a more complete and curated dataset - or are you working with refseq data specifically to start with?

ADD REPLY
0
Entering edit mode

Hello thanks for answering. Yes I'm fetching the tables from RefSeq database and upon your answer I checked how it looks compared to GenBank. For some reason GenBank tables don't have any IDs at all in any of the lines for these 2 columns lol; understandable for refseq accession but they should have a product accession even in GenBank? I'm really confused, although what you said may be right, simply some of the products I'm working with might not be in refseq.

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6