how often these version numbers change?
To find out the changes of version, you can look for Revision history
in Display settings, under the search box.
Is it likely that two Refseq transcripts, which are the same transcript (but different versions), would have different sequences?***
Possible. For example, XM_003440720
and NM_001279661
are two different versions of the same nucleotide sequence. XM_003440720 is now obsolete which was previous version of NM_001279661. They are not completely different sequences in a strict sense, but the new one seems to be improved version, with additional bases to the previous one.
looking for a data file which maps refSeq transcripts to proteins, but also takes into account version numbers.
One way to do this is by using eutils. In terminal:
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=XM_003440720,NM_001279661&retmode=text" | \
grep 'accession "' | \
sed 's/ accession "//g' | \
sed 's/" ,//g' | \
egrep "NP|XP" | \
while read IDS ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${IDS}&retmode=text&rettype=fasta"; done;
The output includes older version as well as newer version protein sequences of above mentioned ID's XM_003440720, NM_001279661.
>gi|348506442|ref|XP_003440768.1| PREDICTED: 40S ribosomal protein S12 [Oreochromis niloticus]
MAEEGRQAHLCVLAANCDEPMYVKLVEALCAEHQINLIKVDDNKKLGEWVGLCKIDREGKPRKVVGCSCV
VVKDYGKESQAKDVIEEYFKSKK
>gi|525343327|ref|NP_001266590.1| 40S ribosomal protein S12 [Oreochromis niloticus]
MAEEGSPAGGVMDVNTALPEVLKTALIHDGLAPGIREAAKALDKRQAHLCVLAANCDEPMYVKLVEALCA
EHQINLIKVDDNKKLGEWVGLCKIDREGKPRKVVGCSCVVVKDYGKESQAKDVIEEYFKSKK