creat histogram in R
1
0
Entering edit mode
12 weeks ago
G.S ▴ 60

Hi all,

I have created this histogram using the same data in R Studio, but I am wondering about the difference in appearance between the histograms generated by ggplot2 and base R.

I understand that Base R (hist()): The number of bins is determined automatically unless specified otherwise using the breaks argument. This default behavior can lead to different bin widths compared to ggplot2.

ggplot2: You explicitly set the number of bins (bins = 30), so the binning will be consistent based on this. However, in base R, unless you control it, the binning algorithm might create fewer or more bins depending on the data distribution.

Please can someone clarify this based on my code? Thanks in advance

#### code for R base 

#####plot (A)
hist(rsv_n_1_9hpi$polya_length,
     #xlab = "HRSV_9_hpi",
     cex.lab = 1.5,
     cex.axis = 1.5,
     cex.main = 1.5,
     cex.sub = 1.5,
     ylab="Count",
     xlab="Poly(A) tail length")    


#####Plot(B) 
hist(rsv_n_1_9hpi$polya_length,
     #xlab = "HRSV_9_hpi",
     cex.lab = 1.5,
     cex.axis = 1.5,
     cex.main = 1.5,
     cex.sub = 1.5,
     ylab="Count",
     xlab="Poly(A) tail length",
     breaks = 30,
     ylim = c(0, 1500), 
    )          

### code for ggplot2

####plot (C)
b<-ggplot(rsv_n_1_9hpi, aes(x = polya_length)) +
  geom_histogram( bins = 30, fill = "blue", 
                 color = "black", alpha = 0.7) +
  #xlab("Poly(A) tail length") +
  #ylab("Count") +
  theme_light()+
 # ylim(0,1500)+
  labs(title = "HRSV (9_hpi, n=1)")+
    scale_x_continuous(limits = c(0, 1200),breaks = c(0, 200, 400, 600,800,1000))+
  theme(plot.title = element_text( size = 15), 
              axis.text = element_text(colour = "black", size=13), 
              axis.title.y = element_text(size = 13),
              legend.text = element_text(size = 13),
              strip.text.x = element_text(size = 13),
              axis.title.x = element_text(size = 13))

enter image description here

R ggplot2 hist • 443 views
ADD COMMENT
3
Entering edit mode
12 weeks ago
ATpoint 86k

Your base R version has fewer bins than the ggplot one, so you would need to tell base R to use to bins to make them identical.

ADD COMMENT
0
Entering edit mode

Thanks. But could you please explain why the height of the bars in the plot changes when I adjust the number of bins?

Is this explanation correct: The height of the bars changes because when you increase or decrease the number of bins, the data gets distributed across more or fewer bars.

More bins: The data is divided into smaller intervals, so each bin will contain fewer data points, resulting in shorter bars. Fewer bins: The data is grouped into larger intervals, so each bin contains more data points, resulting in taller bars.

ADD REPLY
2
Entering edit mode

If you took all the bars in each plot and stacked them on top of each other, you should get the total number of points in your dataset.

Here's a random set of 1000 numbers from a uniform distribution between 0 and 100. As you increase the number of bins, your average bar height decreases.

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 2227 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6