Entering edit mode
2.5 years ago
beginner123
▴
30
Hi, I would like to know the locus tag of a protein annotated with RefSeq (WP_*).
For example, I would like to identify the genomic location of a protein (WP_073031595.1) and also know its adjacent proteins.
The GenBank file has a locus tag and I can easily identify the gene location, but is there any way to identify the location on the genome in the RefSeq file?
Hi, You can use NCBI Datasets to answer some of your questions (download instructions here)
For example: you can download a data package based on a protein accession and find the gene location for any genomes that have that annotation. For example (using the WP number you posted):
The Refseq location can be found in the file
annotation_report.jsonl
. To see the info, you can use jq.In the example you provided, the WP is annotated on one genome. But if you look at the protein accession
WP_000997656
, theannotation_report.jsonl
has 4,338 lines (one per annotated genome).We currently don't have the
locus_tag
in our report. Is that information something that you or your research group find useful? Please let us know. NCBI Datasets is in active development, and we love to hear feedback from users.Feel free to reach out if you have any additional questions.
Working with
WP*
records can be tricky since they can potentially point to multiple species. See: https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/