I don't have a grouping variable, just 2 columns of counts of bacterial families present in every person.
If you have only two columns of data, there is essentially nothing to color by in a scatter plot. To confirm this, it would be helpful to show a little of your data. You might try to write out the question that would be answered by using color.
On the other hand, if you had some third type of data you want to represent with color, you could hand that column name to the color
argument, as iraun points out:
library(ggplot2)
library(ggpubr)
# create some toy data
df <- data.frame(Lachnospiraceae=runif(60,1,1900), Akkermansiaceae=runif(60,1,2200))
# add something to color by
df$class <- sample(c(letters[1:3]), nrow(df), replace=TRUE)
ggscatter(df, x = "Lachnospiraceae", y = "Akkermansiaceae",
color="class",
add = "reg.line", conf.int = TRUE,
add.params = list(color = "brown"),
cor.coef = TRUE, cor.method = "spearman",
cor.coef.coord = c(1900,2200),
cor.coef.size = 4,
xlab = "Lachnospiraceae", ylab = "Akkermansiaceae")
If you have only two columns of data, but want to highlight some characteristic of the data itself (i.e. value ranges) you would still have to add that third column with that characteristic, example:
# create some toy data classified by value range
x <- runif(60,1,1900)
y <- runif(60,1,2200)
x <- x[order(x)]
y <- y[order(y)]
df <- data.frame(Lachnospiraceae=x, Akkermansiaceae=y)
# add something to color by
df$class <- "c"
df$class[x < 1200] <- "b"
df$class[x < 600] <- "a"
I don't necessarily understand everything in your R code, but pretty sure that what you want can't be done. It may seem sometimes that plotting functions can perform magic, but they usually need to know data classes before doing so. I think you need to add a third column where each data point will get a category corresponding to the colors you wish to use, and then color by using that column.
What is the criteria for colouring the datapoints? Once you have it, you can use
color
andpalette
arguments insideggscatter
function.Apparently, it's more difficult than I thought. My simple idea was I have 2 columns with data and all I wanted was one column in one colour and the other in a different colour instead of all in the same. When I added to the second line:
But think this out: you have an x,y plot, which means that each point on your plot consists of two values: one from Lachnospiraceae, the other from Akkermansiaceae. For example, a given point might be (x, y) = (83, 102). Which number should be used to determine the color? What does the color of a single point indicate? What question are you trying to answer with x, y? What additional question are trying to answer with a color? Are you looking for a different kind of plot? Maybe two boxplots, one colored for L, the other for A (which would address the question: does the distribution of L differ from that of A?).