Question

creat histogram in R

0

Entering edit mode

4 months ago

G.S ▴ 60

Hi all,

I have created this histogram using the same data in R Studio, but I am wondering about the difference in appearance between the histograms generated by ggplot2 and base R.

I understand that Base R (hist()): The number of bins is determined automatically unless specified otherwise using the breaks argument. This default behavior can lead to different bin widths compared to ggplot2.

ggplot2: You explicitly set the number of bins (bins = 30), so the binning will be consistent based on this. However, in base R, unless you control it, the binning algorithm might create fewer or more bins depending on the data distribution.

Please can someone clarify this based on my code? Thanks in advance

#### code for R base 

#####plot (A)
hist(rsv_n_1_9hpi$polya_length,
     #xlab = "HRSV_9_hpi",
     cex.lab = 1.5,
     cex.axis = 1.5,
     cex.main = 1.5,
     cex.sub = 1.5,
     ylab="Count",
     xlab="Poly(A) tail length")    


#####Plot(B) 
hist(rsv_n_1_9hpi$polya_length,
     #xlab = "HRSV_9_hpi",
     cex.lab = 1.5,
     cex.axis = 1.5,
     cex.main = 1.5,
     cex.sub = 1.5,
     ylab="Count",
     xlab="Poly(A) tail length",
     breaks = 30,
     ylim = c(0, 1500), 
    )          

### code for ggplot2

####plot (C)
b<-ggplot(rsv_n_1_9hpi, aes(x = polya_length)) +
  geom_histogram( bins = 30, fill = "blue", 
                 color = "black", alpha = 0.7) +
  #xlab("Poly(A) tail length") +
  #ylab("Count") +
  theme_light()+
 # ylim(0,1500)+
  labs(title = "HRSV (9_hpi, n=1)")+
    scale_x_continuous(limits = c(0, 1200),breaks = c(0, 200, 400, 600,800,1000))+
  theme(plot.title = element_text( size = 15), 
              axis.text = element_text(colour = "black", size=13), 
              axis.title.y = element_text(size = 13),
              legend.text = element_text(size = 13),
              strip.text.x = element_text(size = 13),
              axis.title.x = element_text(size = 13))

enter image description here

R ggplot2 hist • 508 views

ADD COMMENT • link updated 4 months ago by ATpoint 87k • written 4 months ago by G.S ▴ 60

score 3 · Answer 1 · 2024-10-09

3

Entering edit mode

4 months ago

ATpoint 87k

Your base R version has fewer bins than the ggplot one, so you would need to tell base R to use to bins to make them identical.

ADD COMMENT • link 4 months ago by ATpoint 87k

0

Entering edit mode

Thanks. But could you please explain why the height of the bars in the plot changes when I adjust the number of bins?

Is this explanation correct: The height of the bars changes because when you increase or decrease the number of bins, the data gets distributed across more or fewer bars.

More bins: The data is divided into smaller intervals, so each bin will contain fewer data points, resulting in shorter bars. Fewer bins: The data is grouped into larger intervals, so each bin contains more data points, resulting in taller bars.

ADD REPLY • link 4 months ago by G.S ▴ 60

2

Entering edit mode

If you took all the bars in each plot and stacked them on top of each other, you should get the total number of points in your dataset.

Here's a random set of 1000 numbers from a uniform distribution between 0 and 100. As you increase the number of bins, your average bar height decreases.

enter image description here

ADD REPLY • link 4 months ago by yura.grabovska ▴ 690