This is NCBI human gene protein only download.
I want to column sub split by | to line up the x-refs for sorting but they are uneven (e.g. where there is MIM in front of the others)
9606 10217 CTDSPL - C3orf8|HYA22|PSR1|RBSP3|SCP3 MIM:608592|HGNC:HGNC:16890|Ensembl:ENSG00000144677|Vega:OTTHUMG00000155942
9606 10218 ANGPTL7 - AngX|CDT6|dJ647M16.1 HGNC:HGNC:24078|Ensembl:ENSG00000171819|Vega:OTTHUMG00000002002
9606 10219 KLRG1 - 2F1|CLEC15A|MAFA|MAFA-2F1|MAFA-L|MAFA-LIKE MIM:604874|HGNC:HGNC:6380|Ensembl:ENSG00000139187|Vega:OTTHUMG00000168277
9606 10220 GDF11 - BMP-11|BMP11 MIM:603936|HGNC:HGNC:4216|Ensembl:ENSG00000135414|Vega:OTTHUMG00000170188
sorting by what? maybe we can extract the target to extra column by which you can sort.
Could you take out the MIM* entries with
sed 's/MIM.*\|H/H/' your_file > new_file
and then split.You have added the tag excel. Is that because you want to do this in excel?
Thanks for the replies - I will try some of these out