find and give a number of 4 digits to repeating strings in a column
1
0
Entering edit mode
2.7 years ago
Priya ▴ 20

i have a file with two columns as column1 having a string of multiple characters and column2 having a unique id or value but column can have repeating strings. what i want to do is to give a specific number to repeating strings from 1 to n. like if i have a file like this :

GCF_000009885.1_ASM988v1_protein.faa:WP_000014594.1
GCF_000009885.1_ASM988v1_protein.faa:WP_000025662.1
GCF_920103885.1_DJ_protein.faa:WP_230633553.1
GCF_920103885.1_DJ_protein.faa:WP_230633554.1
GCF_920103885.1_DJ_protein.faa:WP_230633555.1
and so on...

(it's a csv file with columns seperated by : ) what i want to do is :

0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000014594.1
0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000025662.1
0002:GCF_920103885.1_DJ_protein.faa:WP_230633553.1
0002:GCF_920103885.1_DJ_protein.faa:WP_230633554.1
0002:GCF_920103885.1_DJ_protein.faa:WP_230633555.1
and so on...

i'm working in linux, any command that can do this would be helpful !

awk • 622 views
ADD COMMENT
2
Entering edit mode
2.7 years ago
 awk -F ':' '{if(P!=$1) {P=$1;N++;} printf("%04d:%s\n",N,$0);}'  input.txt
ADD COMMENT
0
Entering edit mode

thanks it worked!!

ADD REPLY

Login before adding your answer.

Traffic: 1236 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6