Entering edit mode
5.3 years ago
muk.smita
•
0
I have a list of proteins which are identified as following:
sp|P20930|FILA_HUMAN Filaggrin OS=Homo sapiens OX=9606 GN=FLG PE=1 SV=3
sp|Q5D862|FILA2_HUMAN Filaggrin-2 OS=Homo sapiens OX=9606 GN=FLG2 PE=1 SV=1
sp|P29508|SPB3_HUMAN Serpin B3 OS=Homo sapiens OX=9606 GN=SERPINB3 PE=1 SV=2
sp|Q08188|TGM3_HUMAN Protein-glutamine gamma-glutamyltransferase E OS=Homo sapiens OX=9606 GN=TGM3 PE=1 SV=4
sp|P31025|LCN1_HUMAN Lipocalin-1 OS=Homo sapiens OX=9606 GN=LCN1 PE=1 SV=1
sp|P62805|H4_HUMAN Histone H4 OS=Homo sapiens OX=9606 GN=HIST1H4A PE=1 SV=2
Can I translate these identifiers in a more manageable form?
What is for you 'more manageable'? Would that be the uniprot name (e.g., P20930), or Full name (e.g., Filaggrin), or Gene Symbol (FLG)? Be more specific please.
I would like to know how I can shorten the protein identity to Full name and also gene symbol.
Thank you
You can use awk to extract these.
For full name something like this will work.
For the gene symbols something like this.