Hi all, I need to perform the following Quality Control checks: 1) allele frequency (AF>5% & <95%), 2) info >0.8, 3) remove all duplicate position SNPs & 4) multi-allelic SNPs.
I am currently using R to do that but because this is not exactly my field I have got a bit stuck.
For checks n 1 & 2, I have found the "snpReady" package that seems ideal and also gives me a report https://cran.r-project.org/web/packages/snpReady/vignettes/snpReady-vignette.html#QC
the code I used is as following:
install.packages("snpReady")
library(snpReady)
library(impute)
library(Matrix)
library(matrixcalc)
library(stringr)
library(rgl)
df <- read.csv("somedf.csv", header = TRUE, na.strings = "NA")
head(df)
dim(df)
geno.ready <- raw.data(as.matrix(df), frame = "long", base = TRUE, sweep.sample = 0.8, call.rate = 0.95, maf = 0.05, imput = FALSE)
However, I can't seem to make it work as it gives me this error:
"Error in raw.data(data = as.matrix(geno), frame = "long", base = TRUE, : could not find function "raw.data""
Do you know why this might be? Would you recommend any other packages or software that would make me do my checks more easily?
thank you! Silvia
EDIT: I manage to fix the error, it worked well on a different computer after updating all the packages and installing individually the ones I needed. However, I can no longer use this script as it allows me to have a data frame with 4 columns only and mine is larger than that. I can't seem to find a suitable package. Do you have any to recommend?
My data has the following headers ID, SNP (e.g., 1:15791:C:T), Allele 1 (i.e, minor allele - e.g., C), Allele 2 (e.g., T), beta, pvalue, Chromosome (e.g., 1), Base-pair position (e.g., 15791).
Can you confirm that you've successfully install "snpReady"? snpReady seems to be dependent on "impute" which is on Bioconductor instead of CRAN and need to be installed manually with
Hi Sam, thank you for your reply! Yes, I have. I noticed the "impute" issue but I installed it successfully as I came across the same code you've posted. The rgl package however gave me an error: "package or namespace load failed for ‘rgl’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): there is no package called ‘digest’". In addition, I also have these warning messages: "package built under R version 3.5.3". I am trying to sort these out but in general I don't think they can explain my "could not find function "raw.data""-error which is the most concerning one.
If that's the case, you will also need to manually install the
digest
package. (can simply useinstall.packages("digest")
)thank you, I actually managed to make it work but that script allows me to have a data frame with 4 columns whereas I got more, so I cannot use it. What scripts or software do you normally use to carry out similar checks?
I'll usually just read in the data.frame and do the filtering manually myself.
e.g. if my data.frame (
df
) has an INFO column and a MAF column and I want to filter by INFO > 0.8 and MAF > 0.05, I can doThank you, I think I am overcomplicating this then! I will try to do it manually too.