Hi,
I have a file with many genes across the genome, and each with a different color depending on whether a variant within the gene has been associated with a phenotype. I would like to create a plot using circos or circlize representing stacked layers where the genes overlap, with colors assigned based on the phenotype (but if the gene is associated with only one phenotype then the layer will only be one, so not stacked). The purpose of this is to immediately visualise which genes are associated with multiple phenotypes (from the stacking), and which phenotypes are associated with the genes. The colors will indicate whether a gene has associations with one type of phenotype (e.g. cancer) or the other (e.g. diabetes). I was thinking of using the "tiles" plot in circos, where the tiles are color-coded. Is there an option to color-code the tiles based on another value? I have also tried with 'highlights' and with 'heatmap' (using the phenotype colors as the factor levels) but I don't think this is the way to go because I cannot see the overlaps if I use these plots, which is what I am mostly interested in.
If I use circlize, I am trying to plot overlapping regions like the plot here: http://jokergoo.github.io/circlize/example/gene_model.html
But using different colors already specified in the data file. If there is a way, could you please direct me to the right function?
This is an example of the data file in R:
df = structure(list(Chr = c("chr1", "chr1", "chr1", "chr1", "chr1",
"chr2", "chr2", "chr2", "chr3", "chr3", "chr4", "chr4", "chr6",
"chr6", "chr6", "chr7", "chr7", "chr7", "chr8", "chr8", "chr9",
"chr9", "chr10", "chr11", "chr12", "chr13", "chr13", "chr19",
"chr19", "chr20", "chr21", "chr22"), pos.start = c(10678425L,
159391160L, 109318306L, 154509258L, 229805966L, 26989551L, 202937054L,
16209774L, 142169092L, 8925911L, 113873068L, 78144140L, 29882328L,
31321038L, 2754229L, 91908370L, 149706362L, 4754575L, 105108497L,
81375712L, 107169073L, 95049590L, 117466805L, 125738394L, 123893076L,
73886275L, 29029377L, 48616438L, 48616760L, 16070165L, 18529136L,
19608500L), pos.end = c(11678425L, 160391160L, 110318306L, 155509258L,
230805966L, 27989551L, 203937054L, 17209774L, 143169092L, 9925911L,
114873068L, 79144140L, 30882328L, 32321038L, 3754229L, 92908370L,
150706362L, 5754575L, 106108497L, 82375712L, 108169073L, 96049590L,
118466805L, 126738394L, 124893076L, 74886275L, 30029377L, 49616438L,
49616760L, 17070165L, 19529136L, 20608500L), Gene = c("ANGPTL7",
"CCDC19", "CELSR2", "DCST1", "GALNT2", "ATRAID", "BMPR2", "FAM49A",
"PAQR9", "THUMPD3-AS1", "CAMK2D", "CNOT6L", "ABCF1", "RDBP",
"SLC22A23", "CDK6", "GIMAP7", "WIPI2", "LRP12", "ZNF704", "ABCA1",
"ASPN", "GFRA1", "ST3GAL4", "CCDC92", "KLF12", "MTUS2", "CA11",
"SPHK2", "KIF16B", "BTG3", "TRMT2A"), color = c("moccasin", "navy",
"moccasin", "yellow", "moccasin", "moccasin", "yellow", "cyan",
"yellow", "green", "goldenrod4", "magenta", "navy", "moccasin",
"moccasin", "yellow", "moccasin", "yellow", "moccasin", "yellow",
"moccasin", "cyan", "navy", "moccasin", "navy", "moccasin", "yellow",
"moccasin", "moccasin", "moccasin", "cyan", "moccasin")), .Names = c("Chr",
"pos.start", "pos.end", "Gene.name", "color"), row.names = c(917L,
953L, 956L, 1005L, 1087L, 1997L, 2003L, 2077L, 2534L, 2560L,
2937L, 2956L, 3495L, 5182L, 4625L, 6612L, 6642L, 6491L, 7060L,
7124L, 7487L, 7501L, 7991L, 8468L, 8897L, 9424L, 9471L, 11476L,
11226L, 11786L, 12117L, 12279L), class = "data.frame")
The part of the configuration file for the plot in circos is this:
<plots>
<plot>
type = tile
file = data/data1.txt
r0 = 0.98r
r1 = conf(.,r0)+0.03r
orientation = center
layers = 24
margin = 0.02u
thickness = 24
padding = 8
stroke_thickness = 0.001
stroke_color = vlgrey
</plot>
</plots>
However, I can't get the colors in the tiles plot to show up correctly: there are some colors that do not come up (maybe because they overlap with too many others? is there a way to prioritise which color needs to be plotted first?), and I have black lines while I do not have a color 'black' for any of the phenotypes (again maybe because the lines are areas where there are too many overlaps?). I have tried adjusting the layers, and the stroke_thickness, but these black lines remain and the correct colors sill do not show. I am attaching the plot I am getting now.
I really appreciate any suggestions!
Thank you in advance for your help!
Fra
Pgibas kindly suggested to use circlize in R. Can I get something like this
http://jokergoo.github.io/circlize/example/gene_model.html
I did this so far, but it gives me an error (
Error in n - I : non-numeric argument to binary operator
)