This is kind of a general question regarding NCBI accesion numbers.
Suppose I have this sequence
>myseq
MGQ-----NSPNLLR------LSQ
--TLVGSSLLSSPSSPTTLKVKMPHAFPFLTPDQ-KKELSDIAHKIVAKGKGILAADES-
--TGSVAKRFQSINTENTEENRRLYRQLLFTA-DERAGPCIGGVIFFHETLYQKTDAGKT
FPEHVKSRGWVVGIKVDKGVVPLAGTN-GETTTQ---GLDGL--------YERCAQYKKD
GCDFAKWRCVLKITSTTPSRLAIMENCNVLARYASICQM--HGIVPIVEPEILPDGDHDL
KRTQYVTEKV-LAAMYKALSDHHVYLEGTLLKPNMVTAGHSCSHKYTHQDIAMATITALR
RTVPPAVPG--ITFLSGGQSEEEASINLNVMNQCPLHRPWAITFSYGRALQASALKAWGG
KPGNGKAAQEEFIKRAL------ANSLACQGKYVSSGN-S-A-AAGDSLFVANHAY
I want to blast it (using blastp and nr) onto the salmon database (Salmo salar). I get three roughly equivalent hits corresponding to three different IDs:
NP_001133180.1, CBL79147.1 and NP_001133181.1
I bet that there are not three different genes. Thus, which sequence(s) should I consider as the 'good' one(s)? The more recent? The 'NP' ones? I could not find any info related to the detailed NCBI sequence identification process (but see this). Many thanks for your advice!
In general you should use RefSeq/Swiss-Prot database for protein searches at NCBI since they are likely to contain better curated representatives.