This maybe sounds a bit simple, but I cannot get the answer.
I have a dataset in R that has 26 samples in rows and many variables (>20) in columns. Some of them are categorical, so what I need to do is to carry out a Kruskal Wallis test for each numerical variable depending on each categorical one, so I do:
env_fact <- read.csv("environ_facts.csv")
kruskal.test(env_fact-1 ~ Categorical_var-1, data=env_fact)
But with this I can only do the test to the numerical variables one by one, which is tiresome.
Is there any way to carry all the Kruskal-Wallis tests for all numerical variables at once? I can repeat it by each categorical variable, since I only have 4, but for the numerical one I have more than 20!!
Thanks a lot
I don't see bioinformatic relevance, but might this help?
A Tutorial on Loops in R - Usage and Alternatives
Correct for multiple testing after this
Hi, just as a side note, your problem might be "under-defined" to estimate variable importance because you have approximately the same number of observations and variables, 26 samples and ">20 variables". I am not sure if your statistics will be robust enough to draw reliable conclusions.
Thanks Michael
If you try to get all your variables at once the result may be not robust enough. However, what I want to do is to test each variable separately, so we will have one variable and 26 observations, classified by other different parameters into groups (3 or more groups, thus Kruskal-Wallis is needed instead of Mann-Whitney).
My problem is to find a command that allows me to do this for each variable at once, not to write the same code for every one (i.e. writing the same code 20 times).
Regards
As already suggested this seems like it can be solved with a simple for loop or a vectorized equivalent. If not then you'll have to explain why.
As you can presume, I am very new at programming, so I do not know how to creat a for loop with my data.
Let's say, I have variables (both numerical and categorical) in columns and samples in rows. Then I have to test each numeric variable based on each categorical one with the samples as the observations.
Then, I should try something like:
Right??
Almost. So this is an R programming question. Have a look at the tutorial linked above and/or any tutorial on R programming to get started. You need to know how to access variables in an R data frame. Look at something like this (not using the formula version of the function and assuming variables have names in column headers):
Sounds Great! However, I still cannot see all the Kruskal-Wallis tests.
If I try something similar to:
I only get a p-value, that I suppose, correspond to the test carried out as a whole and not variable per variable...
How could I get all the tests (i.e. each variable separately) on screen?
Thank you very much, your help is really carrying my analysis off!
I think you need to go and read about how to program in R to understand and reuse pieces of code. At this stage, I think it would be a disservice to code it entirely for you.
Thank you very much Jean-Karim.
I will try to figure out how to finish the analysis in the way I need. As you see, I am very new to R, so within your code I could understand almost everything what you've written, but I won't be able to write it not even in a million years ;-)
Cheers
Thanks, @carambakaracho
Yes, I did not point out the bioinformatic point of view of the topic.
I need this analysis to be carried out on a microbial ecology study to figure out what environmental factors have any influence (or not) on the community composition. I thought my doubt could be shared by anyone else.
Nonetheless, I am going to try what you suggested and post the result afterwards.
Cheers
Use the 'add reply' button to reply to a comment.