Hello everyone,
I have File 1 like this with 2 columns:
g4989 2.70224323450382
g4650 2.71483380183318
g11701 2.83907744860811
g11701 2.83907744860811
g3807 2.83912968405616
g17931 2.84821618321646
and File 2 like this with 4 columns
g4989
g4650 Pfam PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
g11701 Pfam PF04082 Fungalspecifictranscriptionfactordomain
g17931 Pfam PF04082 Fungalspecifictranscriptionfactordomain
Both of the files are tab delimited. File 2 only contains the selective genes from File 1. I want The to add a second column from file1 to file 2 but only for the genes in file two like this:
g4989 2.70
g4650 2.71 Pfam PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
g11701 2.83 Pfam PF04082 Fungalspecifictranscriptionfactordomain
g17931 2.84 Pfam PF04082 Fungalspecifictranscriptionfactordomain
Could you please help me sort this out in linux.
Thank you, Ambika
g11701 is present twice in file1. How should you handle this ?
Yes its present twice, and this is just a sample some of the genes might be present more than that because single gene might have different pfam domains.