Hello everyone,
I have file1 which is tab delimited with following format :
g1 pfam PF12 ABC transporter
g2 pfam PF13 transcription factor
g3 pfam PF14 glycosyl hydrolase
pfam PF15 FAD binding domain
g4 pfam PF16 RTA1 like protein
pfam PF17 Zinc-binding dehydrogenase
pfam PF18 major facilitator superfamily
g5 pfam PF19 short chain dehydrogenase
g6 pfam PF20 ABC transporter
I want to arrange this file such that the lines beginning with pfam will include gene id(g3 or g4, etc.) from previous line. The output file that I want is also tab delimited and looks like this:
g1 pfam PF12 ABC transporter
g2 pfam PF13 transcription factor
g3 pfam PF14 glycosyl hydrolase
g3 pfam PF15 FAD binding domain
g4 pfam PF16 RTA1 like protein
g4 pfam PF17 Zinc-binding dehydrogenase
g4 pfam PF18 major facilitator superfamily
g5 pfam PF19 short chain dehydrogenase
g6 pfam PF20 ABC transporter
Many thanks in advance.
Ambika
Nice one-liner. One question: You don't really nead
first=""
in the BEGIN statement?Thank you. I will try that.