Hi everyone!
I am trying to find the best way to make 2 boxplot for a specific gene from data found in a row for a subset of columns within data frame x.
x dimensions are 634 by 128 columns
Each row is specific to a gene,
Column 1 has gene name, and I want to say look at gene in row#1
columns 2:48 data I want to include in one boxplot
columns 49:128 data I want to include in another boxplot
data frame looks something like this
gene accepted_hits_x1.bam accepted_hits_x1.bam etc....
1 AARS1 -6 0 etc....
I also want to be able to see each data point that makes up the boxplot plotted in the plot
I am having a problem:
I am running into the problem where my data (residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...
data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))
news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news$data <- (log10(as.numeric(news$data)) + 1)
g <- ggplot(data=news, aes(x=as.factor(factor), y=data))
g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41 A38-5 ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))
The problem is that it keeps giving me error saying that:
Removed 110 rows containing missing values (geom_point)
This could be that these values are negative so taking the log10(value)+1?
Are you trying to make boxplot of some specific gene?
Correct but within the data frame I have information for 2 cell types and those are found:
I just edited to clarify
Do you need to do the log transformation? That is what is introducing your NaNs. The boxplot will plot negative numbers if you want to keep them non-transformed.
If you need to do the log transformation, do it like this instead:
Within my libraries there are some that have 0 counts so when trying to find the residual to mean from those libraries for that particular gene... there are some that end up being negative values.
These are being excluded from the plot when I do the log transformation. Yet following your advise and running
allows for all values to be plotted.
Yet due to some outliers I am using the log