Question

reformat a gene association file

0

Entering edit mode

6.8 years ago

lessismore ★ 1.4k

Hey all,

i have a text file with 3 columns tab separated: 1st column: a gene ID
2nd column: a value
3rd column: a list of genes associated to the one in the 1st column comma separated

TMCS09g1008699 6.4 TMCS09g1008677, TMCS09g1008681, TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686, TMCS09g1008680, TMCS09g1008675

etc..

what i want is this:

TMCS09g1008699 6.4 TMCS09g1008677
TMCS09g1008699 6.4 TMCS09g1008681
TMCS09g1008699 6.4 TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686
TMCS09g1008690 5.3 TMCS09g1008680
TMCS09g1008690 5.3 TMCS09g1008675

could someone help me?

r bash awk • 1.2k views

ADD COMMENT • link updated 6.8 years ago by h.mon 35k • written 6.8 years ago by lessismore ★ 1.4k

score 3 · Answer 1 · 2018-03-18

3

Entering edit mode

6.8 years ago

Pierre Lindenbaum 164k

 awk '{for(i=3;i<=NF;i++) print $1,$2,$i;}' input.txt  | sed 's/,$//'

ADD COMMENT • link 6.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

dear Pierre, i made a mistake, in the 3rd column theres no space in my input and now i get a wrong output.

ADD REPLY • link 6.8 years ago by lessismore ★ 1.4k

score 1 · Answer 2 · 2018-03-19

1

Entering edit mode

6.8 years ago

h.mon 35k

Combine Pierre's answer with Tom Fenech's answer at StackOverflow.

Three hints: 1) pay attention at which column you split; 2) you will have two awk commands, separated by a comma; and 3) you will not need the sed command.

ADD COMMENT • link 6.8 years ago by h.mon 35k

1

Entering edit mode

got it, thanks

awk 'BEGIN{FS=OFS="\t"} {n=split($3,a,",");for(i=1;i<=n;i++) print $1,$2,a[i]}

ADD REPLY • link 6.8 years ago by lessismore ★ 1.4k