Hi everyone, I am making some plots to visualize my sequencing data and I am struggling to change the colors of some of my points using ggplot2. The problem is that I am trying to use the viridis palette but plot ONLY the values with percid = 0 (or 0.000 according to the data frame) in grey. This means I want to override the viridis scheme in the specific case that percid = 0. Any point with a percid NOT = 0 should keep the regular viridis color scheme. That means a data point within ANY order can have a percid = 0. Therefore, no colors of the legend should be colored in grey and there shouldn't be a creation of a "Zero Percent Identity" category. The only thing that should change is the color of my points (of any order or kmer_cov
) with percid = 0 to grey. I am plotting percid on the x axis and kmer_cov on the y axis, and this is my code:
ggplot(contiginfo, aes(x = kmer_cov, y = percid, size = querylength, color = factor(order))) +
geom_point(aes(color = ifelse(percid == 0, "Zero Percent Identity", order))) +
scale_x_log10() + # Use a logarithmic scale for the x-axis
scale_size_continuous(range = c(1, 10)) + # Adjust the size range as needed
scale_color_manual(values = c("grey", viridis::viridis_pal()(length(unique(contiginfo$order))))) + # Use grey for zero percent identity, Viridis for others
labs(x = "kmer_cov", y = "percid", size = "Query Length", color = "Order") +
theme_minimal() +
theme(axis.title.x = element_text(size = 12), # Change size of x-axis label
axis.title.y = element_text(size = 12)) +
guides(color = guide_legend(override.aes = list(size = 4)))
Now this is what I get as output:
Any advice is appreciated, thanks!
Completely agree with you- I am still figuring out how to logically organize my data and decide what I am trying to communicate. Going to try and group the colors by super kingdom. By the way, this is a virome sequencing project which makes sense given the number of sequences that were classified as viral. Thanks for the help and suggestions!
Progress update:
Better. But at least visually, there doesn't seem to be a correlation between
kmer_cov
andpercid
, so there is not really a reason to plot the two dimensions against each other? YourQuery Length
is a normalization factor? In that case, you could directly plot the normalized values...PS: Shape could be a useful aesthetic to visually discriminate Viruses, Eukaryota and Bacteria in a dot plot.