Hello, I have a list of gene names and several features, each row represents a gene and its specialities. There are approximately 15000 rows and 11 columns. Some of the genes are encountered more than once (for example there are 4 TP53 data ) and I want to see how many times the gene name is duplicated and I want to use that value. Duplicated gene names are one under the other. As an example: Gene name: rs_id: aa change: CASP7 xx yy TP53 zz hh TP53 ff cc TP53 bb gg WNT aa dd WNT qq kk
I want to find the number of duplicate for each gene (4 for TP53 and 2 for WNT) and I also want to check the aa change for each duplicate. Is there a way to do it in R? Thanks in advance.
You can try library
plyr
, see my post on bioconductor support site:https://support.bioconductor.org/p/71837/#71839
Thank you for your answer, I used the code below:
library(dplyr) newdf <- df %>% group_by(ID) %>% mutate(replicate=seq(n()))
However, I want to define one number only (for example, if a gene is repeated for 6 times, it should be like 6,6,6,6,6,6 not like 1,2,3,4,5,6). Could you suggest a way to do it?
Try count function from
plyr
.