Help re-formatting an Interproscan file
1
0
Entering edit mode
8.5 years ago
Biogeek ▴ 470

Hi,

I have a file with the following:

A0A009FD42,IPR005665,Protein-export membrane protein SecF, bacterial
A0A009FD42,IPR022813,Protein-export membrane protein SecD/SecF, archaeal and bacterial
A0A009FD43,IPR011032,GroES-like
A0A009FD43,IPR018369,Chaperonin GroES, conserved site
A0A009FD43,IPR020818,GroES chaperonin family
A0A009FD47,IPR007698,Alanine dehydrogenase/pyridine nucleotide transhydrogenase, NAD(H)-binding domain
A0A009FD47,IPR007886,Alanine dehydrogenase/pyridine nucleotide transhydrogenase, N-terminal
A0A009FD47,IPR008142,Alanine dehydrogenase/NAD(P) transhydrogenase, conserved site-1
A0A009FD47,IPR008143,Alanine dehydrogenase/pyridine nucleotide transhydrogenase, conserved site-2
A0A009FD47,IPR016040,NAD(P)-binding domain
A0A009FD48,IPR000788,Ribonucleotide reductase large subunit, C-terminal
A0A009FD48,IPR005144,ATP-cone domain
A0A009FD48,IPR008926,Ribonucleotide reductase R1 subunit, N-terminal

I want all the IPR accessions and descriptions in one line for each individual uniprot entry: How can I do this. I'm struggling with converting the format to one I can modify down the line. This is bioinformatics related, so I hope that the admin does not move this post.

I wish for each entry to be like: A0A009FD47 IPRx, IPRy, IPRz etc with all the IPS for one accession in one line.

interproscan Reformat • 1.6k views
ADD COMMENT
0
Entering edit mode
8.5 years ago
Biogeek ▴ 470

Got it sorted... For anyone that's interested:

awk '$1!=p{if(p)print s; p=$1; s=$0; next}{sub(p,x); s=s $0} END{print s}' IPS_termsNR > test
ADD COMMENT

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6