I am trying to perform two sample mendelian randomization on over 300 text files (exposure data). All the files have the same column's name ( Bacteria, chr, bp,rsID, ref.allele, eff.allele, beta, SE,Pvalue ). I have kept all the txt files in a folder.
How can I load the txt files in the folder and perform two sample mendelian randomization for each of the txt file using one Rstudio code. I tried to use for-loop. However, don't know how to replace the filename in for-loop. My ultimate goal is to perform two sample mendelian randomization analysis on each of the txt files and get the result of each analysis.
setwd("D:/Project")
data_files <- list.files("p less than 5e-8") # Identify file names in the folder "p less than 5e-8'
data_files # print all the files
library(TwoSampleMR)
ao <- available_outcomes()
for(i in 1:length(data_files)) { # Head of for-loop
exposure_dat <- read_exposure_data(
filename = data_files[i],
sep = '\t',
snp_col = 'rsID',
beta_col = 'beta',
se_col = 'SE',
effect_allele_col = 'eff.allele',
phenotype_col = '',
units_col = '',
other_allele_col = 'ref.allele',
eaf_col = '',
samplesize_col = '',
ncase_col = '',
ncontrol_col = '',
gene_col = '',
pval_col = 'P.value'
)
exposure_dat <- clump_data(exposure_dat)
exposure_dat
outcome_dat <- extract_outcome_data(exposure_dat$SNP, c('ieu-a-1058'), proxies = 1, rsq = 0.8, align_alleles = 1,
palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_dat, outcome_dat, action = 2)
mr_results <- mr(dat)
mr_results
The above is my for loop r script. However, this does not work.
If I input the file one by one using the following script.
filename = '2_family.Bifidobacteriaceae.txt', is the code that input the file for analysis. It works perfect.
library(TwoSampleMR)
ao <- available_outcomes()
exposure_dat <- read_exposure_data(
filename = '2_family.Bifidobacteriaceae.txt',
sep = '\t',
snp_col = 'rsID',
beta_col = 'beta',
se_col = 'SE',
effect_allele_col = 'eff.allele',
phenotype_col = '',
units_col = '',
other_allele_col = 'ref.allele',
eaf_col = '',
samplesize_col = '',
ncase_col = '',
ncontrol_col = '',
gene_col = '',
pval_col = 'Pvalue'
)
exposure_dat <- clump_data(exposure_dat)
outcome_dat <- extract_outcome_data(exposure_dat$SNP, c('ieu-a-1058'), proxies = 1, rsq = 0.8, align_alleles = 1,
palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_dat, outcome_dat, action = 2)
mr_results <- mr(dat)
Vectorization is better than loops.
Use
lapply(files, read_exposure_data, sep="\t", snp_col='rsID',...)
Thank you, Sam. I tried your method. I think it should work. However, under the package of TwoSampleMR, a message of 'Please look at vignettes for options on running this locally if you need to run many instances of this command.' is popped out.
Thank you Ram. Agree lapply will be better and faster. Usage of read_exposure_data( filename, sep = "\t ", ...... ) Trying to work it out using lapply
It's not intuitive, but you can pass all arguments that you'd pass to
read_exposure_data
directly tolapply
instead.lapply
will pass them to each call ofread_exposure_data
. Example: