Hello all,
I have a file "file1.txt " which initially looks like this,
Orthogroup F105 F109 F23 F79 HDV247 T415
OG0006155 F105|108872
OG0006156 F105|114651
OG0006157 F105|115307
OG0006158 F105|121488
OG0006551 F109|843828
OG0006552 F109|844465
OG0006553 F109|845048
OG0006557 F23|102768
OG0006558 F23|106636
OG0006559 F23|108691
OG0006560 F23|108697
OG0006841 F79|103483
OG0006842 F79|103507
OG0006843 F79|165341
OG0006844 F79|175705
OG0006990 HDV247|10004
OG0006991 HDV247|1003
OG0006992 HDV247|10048
OG0006993 HDV247|10077
OG0006994 HDV247|10100
OG0006995 HDV247|10102
OG0008562 T415|110675
OG0008563 T415|115534
I am trying to assign a number 1 or 0 to each of these columns depending upon the genes present or absent. so The oupput would look something like this.
Orthogroup F105 F109 F23 F79 HDV247 T415
OG0006155 1 0 0 0 0 0
OG0006156 1 0 0 0 0 0
OG0006157 1 0 0 0 0 0
OG0006158 1 0 0 0 0 0
OG0006551 0 1 0 0 0 0
OG0006552 0 1 0 0 0 0
OG0006553 0 1 0 0 0 0
OG0006557 0 0 1 0 0 0
OG0006558 0 0 1 0 0 0
OG0006559 0 0 1 0 0 0
OG0006560 0 0 1 0 0 0
OG0006841 0 0 0 1 0 0
OG0006842 0 0 0 1 0 0
OG0006843 0 0 0 1 0 0
OG0006844 0 0 0 1 0 0
OG0006990 0 0 0 0 1 0
OG0006991 0 0 0 0 1 0
OG0006992 0 0 0 0 1 0
OG0006993 0 0 0 0 1 0
OG0006994 0 0 0 0 1 0
OG0006995 0 0 0 0 1 0
OG0008562 0 0 0 0 0 1
OG0008563 0 0 0 0 0 1
So far I have been able to replace each column with 1 separately with the following code
grep "F105" File1.txt | sed 's/F105|[0-9]*/1/g' > F105.genecount
grep "F109" file1.txt | sed 's/F109|[0-9]*/1/g' > F109.genecount
which assigns number "1" to genes if present. The problem with this is I have to make multiple files and when I concatenate each files to one at the end it is giving the number "1" in second column only not in their respective columns. How can I get the desired output in a neat way. Please help.
Thank you JC this code works!!!