Calculating the frequency of every pathogenic germline variant in every disease cohort?
0
0
Entering edit mode
2.5 years ago
Hasan • 0

I am trying to calculate the frequency of every pathogenic germline variant in every disease cohort. For ex: for variant 1:17588689, there are 20 het.variants (column H), I need to report what percentage of these samples are in glioma cohort, what percent in meningioma cohort..etc. I should append that information for each disease cohort, i.e. (meningioma, glioma, schwannoma, pituitary adenoma, others) at the end of the last column. So, also have to make sure the sum of the percentages add up to 1. The heterozygous sample IDs are listed in column-AZ, titled "HetSamples" and IDs are separated by comma in my dataset. I am stuck at some point so I would appreciate if someone can assist me to complete it.

library(stringr)
library(tidyr)
library(dplyr)
diseaseData <- read.delim(".../.txt", header = T, sep = "\t") #disease cohort informations
variantData <- read.delim(".../.txt", header = T, sep = "\t")
variantData <- variantData %>%
mutate(HetSamples = strsplit(as.character(HetSamples), ",")) %>%
unnest(HetSamples)
variantDataOld <- variantData %>%
filter(!str_detect(HetSamples, 'U'))
variantDataNew <- variantData %>%
filter(str_detect(HetSamples, 'U'))
diseaseDataOld <- diseaseData %>%
filter(!str_detect(ClinicalSeqID, 'U'))
diseaseDataNew <- diseaseData %>%
filter(str_detect(ClinicalSeqID, 'U'))
data.frame(do.call("rbind", strsplit(as.character(variantDataOld$HetSamples), "-", fixed = TRUE)))
data.frame(do.call("rbind", strsplit(as.character(diseaseDataOld$ClinicalSeqID), "-", fixed = TRUE)))
variantDataOld[c('Col1', 'Col2', 'Col3')] <- str_split_fixed(variantDataOld$HetSamples, '-', 3)
r bioconductor biostatistics • 507 views
ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6