Entering edit mode
2.3 years ago
aj123
▴
120
Hi,
I have a table like below:
patient geneid base count
"ptp_1", "BRCA1", "C", 123,
"ptp_1", "BRCA1", "G", 2,
"ptp_1", "BRCA1", "T", 55,
"ptp_2", "BRCA2", "A", 303,
"ptp_2", "BRCA2", "C", 11
"ptp_2", "BRCA2", "G", 1,
How to generate a wide data.frame that has one row per {patient x gene} and one column for each of the base's counts.
For example:
participant gene A_count C_count G_count T_count
"ptp_1" "BRCA1" <values>
"ptp_1" "BRCA2"
"ptp_2" "BRCA1"
"ptp_2" "BRCA2"
I tried the following in dplyr but am not getting the exact result:
clean_df_mut_counts_wide <- clean_df_mut_counts %>% filter(base == "A") %>% group_by(participant) %>% group_by(gene) %>% summarise(A_count = sum(as.factor(base == "A")))
Danke! Im trying to calculate base frequency and second highest frequency base like this but its giving me a table without patient and gene-
I do not see which frequency you would like in output, would you give an example ?
The above frequencies of the bases and also find the second most frequently occurring base in each patient. Hope this clarifies. thank you.
Was able to achieve the above by following, after pivoting-
Still not sure how to find 2nd most frequently occuring base per patient. Tried the following but it is not working-