Hi, I have gene absence and presence data for approximately 60 genomes. I have created matrix for each gene family by giving value 1 if its present and 0 if its absent. I want to cluster this data by strains which are more similar in sharing genes and also gene families which are shared in different strains. I know R can do Hierarchical clustering. But I am looking for some thing more visual such as heat map or correlation plot.
My data look like this. Any idea what method would be best to represent this data?
GROUP Pla302278PT Pla3988 PmaH7608 Pma90_32 PtoDC3000 PmaM6 PmaM4a Pto1108 PtoT1 PtoK40 PtoMax13 Pav631 Pmp302280PT Pan302091 Ptt50252 Pja301072 Ppi1704B Pav037 Pav013 Pac302273 PacA10853 PsyB728a PssB48 PssA2 PsCit7 Psv4352 Psv3335 PmyAZ84488 PmpFTRS_U7 Pae3681 Pae0893_23 Pla301315 Pla107 PlaYM7902 Pta11528 Pta6606 PseHC_1 Pmo301020 PgyB076 PgyUnB647 PgyBR1 PgyKN44 PgyLN10 PphY5_2 PmaKN91 PphNPS3121 PphHB10Y Pph1448A PmeN6801 Pph1302A PmaYM7930 PmaES4326 Pci0788_9 Por36_1
OrthoGroup7591.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
OrthoGroup13947.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0
OrthoGroup6352.114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
OrthoGroup3637.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Please look carefully at your data and make sure what is shown above is the correct format (I was assuming your data is not in paragraph form). Also, just post a sample, not the whole data set.