Hi
I would like to extract the longest isoform from a RefSeq proteome fasta. I looked at suggested solutions on this and other fora but I think none of these would work with a RefSeq proteome fasta.
For example, I would like to select longest isoforms from this fasta: ftp://ftp.ncbi.nih.gov/genomes/Hyalella_azteca/protein/
Headers for this RefSeq proteome look as follows:
gi|1067085875|ref|XP_018026876.1| PREDICTED: calcineurin subunit B type 2 isoform X3 [Hyalella azteca]
gi|1067085873|ref|XP_018026875.1| PREDICTED: calcineurin subunit B type 2 isoform X2 [Hyalella azteca]
gi|1067085871|ref|XP_018026874.1| PREDICTED: calcineurin subunit B type 2 isoform X1 [Hyalella azteca]
gi|1067085879|ref|XP_018026878.1| PREDICTED: calcineurin subunit B type 2 isoform X5 [Hyalella azteca]
gi|1067085877|ref|XP_018026877.1| PREDICTED: calcineurin subunit B type 2 isoform X4 [Hyalella azteca]
Anyone an idea? Any suggestions is much appreciated!