Genetic QC checks in R - packages and errors
0
0
Entering edit mode
4.8 years ago
Silvia ▴ 20

Hi all, I need to perform the following Quality Control checks: 1) allele frequency (AF>5% & <95%), 2) info >0.8, 3) remove all duplicate position SNPs & 4) multi-allelic SNPs.

I am currently using R to do that but because this is not exactly my field I have got a bit stuck.

For checks n 1 & 2, I have found the "snpReady" package that seems ideal and also gives me a report https://cran.r-project.org/web/packages/snpReady/vignettes/snpReady-vignette.html#QC

the code I used is as following:

install.packages("snpReady")
library(snpReady)
library(impute)
library(Matrix)
library(matrixcalc)
library(stringr)
library(rgl)

df <- read.csv("somedf.csv", header = TRUE,  na.strings = "NA")
head(df)
dim(df)
geno.ready <- raw.data(as.matrix(df), frame = "long", base = TRUE, sweep.sample = 0.8, call.rate = 0.95, maf = 0.05, imput = FALSE)

However, I can't seem to make it work as it gives me this error:

"Error in raw.data(data = as.matrix(geno), frame = "long", base = TRUE, : could not find function "raw.data""

Do you know why this might be? Would you recommend any other packages or software that would make me do my checks more easily?

thank you! Silvia

EDIT: I manage to fix the error, it worked well on a different computer after updating all the packages and installing individually the ones I needed. However, I can no longer use this script as it allows me to have a data frame with 4 columns only and mine is larger than that. I can't seem to find a suitable package. Do you have any to recommend?

My data has the following headers ID, SNP (e.g., 1:15791:C:T), Allele 1 (i.e, minor allele - e.g., C), Allele 2 (e.g., T), beta, pvalue, Chromosome (e.g., 1), Base-pair position (e.g., 15791).

SNP software error R QC • 2.2k views
ADD COMMENT
0
Entering edit mode

Can you confirm that you've successfully install "snpReady"? snpReady seems to be dependent on "impute" which is on Bioconductor instead of CRAN and need to be installed manually with

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("impute")
ADD REPLY
0
Entering edit mode

Hi Sam, thank you for your reply! Yes, I have. I noticed the "impute" issue but I installed it successfully as I came across the same code you've posted. The rgl package however gave me an error: "package or namespace load failed for ‘rgl’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): there is no package called ‘digest’". In addition, I also have these warning messages: "package built under R version 3.5.3". I am trying to sort these out but in general I don't think they can explain my "could not find function "raw.data""-error which is the most concerning one.

ADD REPLY
0
Entering edit mode

If that's the case, you will also need to manually install the digest package. (can simply use install.packages("digest"))

ADD REPLY
0
Entering edit mode

thank you, I actually managed to make it work but that script allows me to have a data frame with 4 columns whereas I got more, so I cannot use it. What scripts or software do you normally use to carry out similar checks?

ADD REPLY
0
Entering edit mode

I'll usually just read in the data.frame and do the filtering manually myself.

e.g. if my data.frame (df) has an INFO column and a MAF column and I want to filter by INFO > 0.8 and MAF > 0.05, I can do

res <- subset(df, INFO > 0.8 & MAF > 0.05)
ADD REPLY
0
Entering edit mode

Thank you, I think I am overcomplicating this then! I will try to do it manually too.

ADD REPLY

Login before adding your answer.

Traffic: 1684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6