Entering edit mode
5.8 years ago
zizigolu
★
4.3k
Hi,
I have calculated mean , mean - sd and mean + sd for a bunch of samples in terms of some negative controls. I want to illustrate samples in and out of + and - SD but I don't know how to bold these samples something like this
Any help please
> head(data)
sample mean mean+sd mean-sd
1: A2 -1.210713 1.541450 -3.9628767
2: A3 3.125620 5.877783 0.3734567
3: A4 2.687265 5.439429 -0.0648978
4: A6 4.989040 7.741203 2.2368766
5: A7 -1.194626 1.557537 -3.9467896
6: A8 -1.628225 1.123939 -4.3803880
>
I would suggest you use a violin plot with a swarm plot, rather than a box plot, e.g.:
See also this tweet for what can be wrong with boxplots:
More plotting suggestions can be found in this blog post.
Thank you, I will need the names of samples being bolded to exclude samples out of this ranges
But then why do you need a plot? Just use the values and set cutoffs...
I looked by eyes likely all samples are in range. I have two datasets I will need to compare which of matched samples among 2 datasets more deviates from this range though :(
A side note: mean should be accompanied by
"standard error of mean (SEM)"
and not the standard deviationCould you explain why? I mean, the SEM is used (colloquially) to indicate how far the sample mean is likely to be from the population mean. What relevance is that when trying to decide whether an individual sampled point is an outlier within a given sample from the population?
Actually this is a HTG EdgeSeq assay. In this assay we have 4 negative probes by which we can check the quality of sample. The mean of raw counts assign by each of negative probes should place in plus and minus one standard deviation of total mean of sample means. I want to recognize bad samples as outliers
Sorry, I wasn't referring to you F. It was the advice from Santosh that I was questioning
Hi russhh, and sorry for a late followup. I see your point and you are right. I hurriedly looked at the problem, and as the OP was calculating samples means and sd in the data above, I thought that it was about putting error bars on the estimation of the sample means. But since it is about outlier detection of sample points, sd of samples is the correct approach. However, I am not sure how much power 1-sd will give to identify outliers - as about 1/3rd of the data is outside 1-sd in any normal distribution.