boxplot with jitter
2
0
Entering edit mode
5.4 years ago

Hi,

I am trying to make a box plot with the following codes.

 NDVI_ts <- read.table("merge_2_NC.csv", header = TRUE)
NDVI_ts
library(ggplot2)
library(scales)
library(tidyverse)
library(ggpubr)
NormalvsCancer<-interaction(NDVI_ts$Data, sep="\t")

######Outfile name as input file
pdf("boxplot.pdf")
##########
xlabs <- paste(levels(NDVI_ts$Data),"\n(N=",table(NDVI_ts$Data),")",sep="")
#ggplot(df,aes(x=group,y=x,color=group))+geom_boxplot()+scale_x_discrete(labels=xlabs)
p2 = ggplot(NDVI_ts, aes(x=NormalvsCancer, y=Read_count)) +
  geom_point(aes(fill=NormalvsCancer), size=5, shape=21, colour="grey20",
             position=position_jitter(width=0.2, height=0.1)) +
  geom_boxplot(aes(fill = Data), width = 0.6, outlier.colour=NA, fill=NA) + 
  scale_x_discrete(labels=xlabs) +
  theme_bw() + theme(axis.text.x = element_text(angle = 360, hjust = 1)) +
  stat_compare_means(aes(group = Data), label = "p.format")
print(p2)
dev.off()

I am getting the output image like this:

enter image description here

Everything is ok except in bladder_normal(19). Here N=0, whereas it is showing 1.

The csv input file is this.

Data    No_matched  Read_count
Bladder_tumor(414)  1   2
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   10
Bladder_tumor(414)  1   24
Bladder_tumor(414)  1   3
Bladder_tumor(414)  1   8
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   2
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   2
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   1
Bladder_tumor(414)  1   1
Bladder_normal(19)  0   0

Any help is much appreciated.

Thanks

R • 2.4k views
ADD COMMENT
0
Entering edit mode

Changed the link to display the image properly. What is the actual question now, I do not really get it?

ADD REPLY
0
Entering edit mode

OP is bypassing factors using a logic doesn't make sense to me. OP, instead of

NDVI_ts <- read.table("merge_2_NC.csv", header = TRUE)
..
..
xlabs <- paste(levels(NDVI_ts$Data),"\n(N=",table(NDVI_ts$Data),")",sep="")

try

NDVI_ts <- read.table("merge_2_NC.csv", header = TRUE, stringsAsFactors = FALSE)
..
..
xlabs <- paste(unique(NDVI_ts$Data),"\n(N=",table(NDVI_ts$Data),")",sep="")

Even better, split the first column into two columns (category and count) so you're dealing with atomic data.

ADD REPLY
0
Entering edit mode

Ok, thanks. I will try this.

ADD REPLY
0
Entering edit mode

I want to get the count of normal and tumor samples as N. If the number of matched with N is "0" then it should get reflected that Bladder normal(19) N=0 and tumor N=17.

ADD REPLY
0
Entering edit mode

I think you have the header wrong, shouldn't it be NormalvsCancer. Anyhow, N=1 means one sample

ADD REPLY
0
Entering edit mode

Actually, there is no normal sample but I want to show the number of normal sample is "0".

ADD REPLY
0
Entering edit mode

Your xlabs don't match up with the actual data. Split the first column so you have a clean data.frame and build the axis labels in a better manner.

ADD REPLY
0
Entering edit mode

I got lost. table(NDVI_ts$Data) shouldn't return 1 for Bladder_normal(19)?

ADD REPLY
0
Entering edit mode

It does and should because table does simply quantifies characters :)

ADD REPLY
0
Entering edit mode

I don't understand the use of NormalvsCancer<-interaction(NDVI_ts$Data, sep="\t") Why don't you just use Data? interaction function returns unordered values and is irrelevant here.

ADD REPLY
0
Entering edit mode

Thanks for pointing it out. I will correct it.

ADD REPLY
3
Entering edit mode
5.4 years ago
ravipatel4 ▴ 50

Change your code that creates xlabs to following. It should fix your problem:

xlabs <- paste(levels(NDVI_ts$Data),"\n(N=",

table(NDVI_ts[ NDVI_ts$No_matched!=0 ,]$Data),")",sep="")

ADD COMMENT
0
Entering edit mode

Thank you Ravi finally solved without much modification.

ADD REPLY
0
Entering edit mode

Be aware that this will only work when NDVI_ys$Data is a factor. For character columns, levels(df$col_name) will be NULL and table(subset_of_df) will not include values in a column that are not part of that subset. In your case, Bladder_normal(19) will be excluded from the table if Data is a character column.

ADD REPLY
1
Entering edit mode
5.4 years ago
ATpoint 85k

Don't use table the way you do it as this simply counts presence of strings or characters regardless of the No_matched column. Do something like sum(as.numeric(NDVI_ts $No_matched)) to use that column and its numeric content.

ADD COMMENT

Login before adding your answer.

Traffic: 1722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6