Entering edit mode
3.0 years ago
Julia
•
0
what is the difference between genbank and refseq
what is the difference between genbank and refseq
GenBank - https://www.ncbi.nlm.nih.gov/genbank/
GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences
RefSeq - https://www.ncbi.nlm.nih.gov/refseq/about/
The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.
There is a FAQ question that directly answers parent question in this thread. It can be found here.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I think the above is quite buzzword-heavy and does not explain what RefSeq actually is. What does "comprehensive" and "integrated" really mean? I am not sure.
In a nutshell, I believe that Refseq curators take entries submitted to Genbank and designate one GenBank entry as a representative sequence for each organism (this is the non-redundant part). They probably pick a representative sequence that is well annotated. But frankly, I could be wrong. I am assuming this is what happens.
For example for the millions of SARS-COV-2 sequences out there, there is only one "reference" sequence, that also has both a GenBank and a RefSeq id, for the exact same sequence.
RefSeq curation process is described in the link above. Main point is RefSeq records are manually curated. These records are owned by NCBI so they have complete control over the content as opposed to GenBank entries which are owned by submitters (and this can and do contain errors at times).
RefSeq genomes is a separate section. Prokaryotic RefSeq genomes are described on this page.
There are many GenBank genomes for SARS-CoV-2 but only a couple are designated as
RefSeq
(GCF* accession
, listing truncated to save space):