NCBI Gene column splits by x-ref
1
0
Entering edit mode
7.8 years ago
cdsouthan ★ 1.9k

This is NCBI human gene protein only download.

I want to column sub split by | to line up the x-refs for sorting but they are uneven (e.g. where there is MIM in front of the others)

9606    10217   CTDSPL  -   C3orf8|HYA22|PSR1|RBSP3|SCP3    MIM:608592|HGNC:HGNC:16890|Ensembl:ENSG00000144677|Vega:OTTHUMG00000155942
9606    10218   ANGPTL7 -   AngX|CDT6|dJ647M16.1    HGNC:HGNC:24078|Ensembl:ENSG00000171819|Vega:OTTHUMG00000002002
9606    10219   KLRG1   -   2F1|CLEC15A|MAFA|MAFA-2F1|MAFA-L|MAFA-LIKE  MIM:604874|HGNC:HGNC:6380|Ensembl:ENSG00000139187|Vega:OTTHUMG00000168277
9606    10220   GDF11   -   BMP-11|BMP11    MIM:603936|HGNC:HGNC:4216|Ensembl:ENSG00000135414|Vega:OTTHUMG00000170188
Excel • 1.7k views
ADD COMMENT
0
Entering edit mode

sorting by what? maybe we can extract the target to extra column by which you can sort.

ADD REPLY
0
Entering edit mode

Could you take out the MIM* entries with sed 's/MIM.*\|H/H/' your_file > new_file and then split.

You have added the tag excel. Is that because you want to do this in excel?

ADD REPLY
0
Entering edit mode

Thanks for the replies - I will try some of these out

ADD REPLY
0
Entering edit mode
7.8 years ago

A way to sort by HGNC:HGNC:xxxxx on Linux/OS X/Windows using csvtk without changing original column.

Steps:

  1. retrieve integers of HGNC:HGNC:xxxxx as a new (7th) column
  2. sort by 7th column
  3. remove 7th column

Command:

$ csvtk mutate -H -t -f 6 -p "HGNC:HGNC:(\d+)" data.tsv \
    | csvtk sort -H -t -k 7:n \
    | csvtk cut -H -t -f -7 

9606    10220   GDF11   -       BMP-11|BMP11    MIM:603936|HGNC:HGNC:4216|Ensembl:ENSG00000135414|Vega:OTTHUMG00000170188
9606    10219   KLRG1   -       2F1|CLEC15A|MAFA|MAFA-2F1|MAFA-L|MAFA-LIKE      MIM:604874|HGNC:HGNC:6380|Ensembl:ENSG00000139187|Vega:OTTHUMG00000168277
9606    10217   CTDSPL  -       C3orf8|HYA22|PSR1|RBSP3|SCP3    MIM:608592|HGNC:HGNC:16890|Ensembl:ENSG00000144677|Vega:OTTHUMG00000155942
9606    10218   ANGPTL7 -       AngX|CDT6|dJ647M16.1    HGNC:HGNC:24078|Ensembl:ENSG00000171819|Vega:OTTHUMG00000002002

PS: If you are using MS Excel, you can also copy the last column to some text editor like notepad++ and replace (Ctrl-H) MIM.+?\| with nothing, and then copy paste back to Excel. Even this, you can only sort by the last column in alphabetical order.

ADD COMMENT
0
Entering edit mode

If it is a a large file, you can save the Excel file as a 'text tab-delimited' file, and then process.

ADD REPLY

Login before adding your answer.

Traffic: 2956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6