Hello everyone,
I have File 1 with 6 columns:
**geneId** "\t" **scaffold** "\t" **LocusID** "\t" **pfam** "\t" **PfamID** "\t" **function**
1 "\t" aa "\t" XL_012 "\t" pfam "\t" P1234 "\t" .............
2 "\t" bb "\t" XL_013 "\t" pfam "\t" P1324"\t ...........
3 "\t" cc "\t" XL_014 "\t" pfam "\t" P1624"\t ...........
3"\t" cc "\t" XL_014 "\t" pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" pfam "\t" P1326"\t ...........
4 "\t" dd "\t" XL_015 "\t" pfam "\t" P1326"\t ...........
5 "\t" ee"\t" XL_016
File 2 with 2 columns
**LocusID** "\t" **Expression**
XL_012 "\t" 23
XL_013 "\t" 20
XL_014 "\t" 15
XL_015 "\t" 10
XL_016 "\t" 9
Desired output : I want to merge column expression to file1 based on locus ID:
**geneId** "\t" **scaffold** "\t" **LocusID** "\t" **Expression** "\t" **pfam** "\t" **PfamID** "\t" **function**
1 "\t" aa "\t" XL_012 "\t" 23 "\t" pfam "\t" P1234 "\t" .............
2 "\t" bb "\t" XL_013 "\t" 20 "\t" pfam "\t" P1324"\t ...........
3 "\t" cc "\t" XL_014 "\t" 15 "\t" pfam "\t" P1624"\t ...........
3"\t" cc "\t" XL_014 "\t" 15 "\t"pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" 10 "\t"pfam "\t" P1326"\t ...........
4"\t" dd "\t" XL_015 "\t" 10 "\t"pfam "\t" P1326"\t ...........
4 "\t" dd "\t" XL_015 "\t" 10 "\t" pfam "\t" P1326"\t ...........
5 "\t" ee"\t" XL_016 "\t"9
I could do this easily if I had same number of rows for both files but here since I have different number of rows and some repetitions of locus ID due to many pfam domains for single gene I am super confused. Please help me figure out this one using awk, perl or bash. Many thanks in Advance!!!!
Thank you,
Ambika
Man! Where does this come from? I am eager to know ;)