Hi there, I would like to filter my dataframe which is made of 5 columns, of which column1 contains gene names, column 2 contains Fold Changes (expressed as logFC), column 3 contains the FDR-adjusted p-value and the other two columns contain other things.
The thing is that my genes can be duplicated in the data.frame, so I would like to remove duplicated values. To remove duplicated values I am sorting by FDR to keep the gene (among the duplicates) that has the lowest FDR, by doing this:
convertedata2 = convertedata %>% group_by(Geneid) %>% filter(FDR == min(FDR))
The problem is that some genes can have the same minimum FDR (e.g. if all genes have FDR=1), so they are not filtered.... To remove them, I would like to filter based on the logFC, and I would like to keep the gene that has the highest absolute(logFC). So I thought to change the previous command into this:
convertedata2 = convertedata %>% group_by(Geneid) %>% filter(FDR == min(FDR)) %>% filter(logFC == max(abs(logFC)))
but the problem is that it doesn't work... I suspect it has to do with the abs
function, but I am not sure why and what is going on.
Any help is much appreciated!
Thanks Luca
Thanks rpolicastro! You are always super helpful!