Hi , im trying to preorm SKAT test (https://rdrr.io/cran/SKAT/man/SKAT.html) for every gene in my data . in order to do that for every gene i created the genetype matrix which has 21 rows ( as number of patients) and columns as number of SNPs for every gene . this is my code :
disease_vector <- rep(0, 21)
disease_vector[1:8] <- 0
disease_vector[9:21] <- 1
gene_results <- list()
# get the unique genes in the data frame
genes <- unique(snps_encoded$Gene.refGene_ANNOVAR)
# loop through each gene
for (gene in genes) {
# subset the data frame to include only the SNPs for the current gene
gene_df <- snps_encoded[snps_encoded$Gene.refGene_ANNOVAR == gene, ]
# count the number of SNPs for the current gene
num_snps <- nrow(gene_df)
# skip genes with only one SNP
if (num_snps == 1) {
next
}
# create a matrix to store the encoding information
encoding_matrix <- matrix(0, nrow = 21, ncol = num_snps)
# loop through each SNP for the current gene
for (i in 1:num_snps) {
# get the SNP ID and encoding information
snp_id <- gene_df[i, "snp_id"]
encoding_0 <- gene_df[i, "encoding_0"]
encoding_1 <- gene_df[i, "encoding_1"]
encoding_2 <- gene_df[i, "encoding_2"]
# set the values in the encoding matrix based on the encoding information
encoding_matrix[, i] <- c(rep(0, encoding_0), rep(1, encoding_1), rep(2, encoding_2))
}
# store the results for the current gene in the list
gene_results[[gene]] <- list(encoding_matrix = encoding_matrix)
}
for example this is the out put for gene :"AGL"
$encoding_matrix
[,1] [,2]
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 0 0
[5,] 0 0
[6,] 0 0
[7,] 0 0
[8,] 0 0
[9,] 0 0
[10,] 0 0
[11,] 0 0
[12,] 0 0
[13,] 0 0
[14,] 0 0
[15,] 0 0
[16,] 0 0
[17,] 0 0
[18,] 0 0
[19,] 0 0
[20,] 0 0
[21,] 1 1
so it has 2 SNPs and the values in the encoding matrix 0, 1, 2 for AA, Aa, aa, where A is a major allele and a is a minor allele. the out come disease vector has values 1 if the person has severe disease and 0 if mild ( there are 8 mild and 13 severe patients).
than i run SKAT test for every gene :
# create an empty data frame to store the results
skat_results <- data.frame(gene_name = character(), p_value = numeric())
# loop through each gene
for (gene in genes) {
# skip genes with only one SNP
if (num_snps == 1) {
next
}
# extract the encoding matrix and disease vector for the current gene
encoding_matrix <- gene_results[[gene]]$encoding_matrix
# run the SKAT test
obj <- SKAT_Null_Model(disease_vector ~ 1, out_type="D")
skat_p_value <- SKAT(encoding_matrix, obj)$p.value
# add the gene name and p-value to the results data frame
skat_results <- rbind(skat_results, data.frame(gene_name = gene, p_value = skat_p_value))
}
- skat_results <- rbind(skat_results, data.frame(gene_name = gene, p_value = skat_p_value))
- } Sample size (non-missing y and X) = 21, which is < 2000. The small sample adjustment is applied! Error in SKAT_MAIN_Check_Z(Z, obj.res$n.all, obj.res$id_include, SetID, : Dimensions of y and Z do not match
cant seem to understand where is it coming from and how to deal with it as Z is a matrix which has 21 rows and column that change according to the number of SNPs in every gene . and diasease_vectore is a vector length 21
I'm not sure if this is what's causing your error, but you don't define num_snps in the gene loop for the code block where you run the SKAT test on every gene (there's a difference between your first and third code block)