I have a .fasta file with amino acid sequences. The beginning of the file is as follows:
>lcl|NC_019674.1_prot_1 [locus_tag=BN341_RS00005] [protein=OmpA family protein] [pseudo=true] [location=join(1804546..1804601,1..456)] [gbkey=CDS]
MKKWFLAAAVVACVLMTGCPPRLKKPPPPPNPPPNLKNTPCPKKRPRPSP*KNPSPM*KVARLSGKCILI
LTNTMCVQTCKAQSMKP*KKSKNTV*KYSWRATPMSLVQANIILP*ATNAALV*KMF*LSRASVRTVLKW
*VLEKPNPFARKKLQSATVKTAVLTSKLWT
I am trying to find the source of this file. I believe I obtained it on NCBI Nucleotide (https://www.ncbi.nlm.nih.gov/nuccore/) while searching for the complete genome of Helicobacter species. Once I found the species, I believe I clicked on "Send to", "Coding sequences", and then "FASTA protein". Then, I downloaded that as .fasta file.
Now, I am trying to determine the exact origin of this .fasta file I have. I am attempting to give the NCBI Nucleotide link to colleagues. Is it possible for me to 'reverse engineer' this type of file and determine where I downloaded it from?
You could also search NCBI with
NC_019674
which will lead you to this genome page. Protein and nucleotide fasta sequences are available in top box. Note: These are representative sequences for multiple genomes and are labeled withWP
identifiers.