Question

Reading multiple txt files and perform Two Sample Mendelian Randomization in R

0

Entering edit mode

4.0 years ago

mscpythonstudy ▴ 20

I am trying to perform two sample mendelian randomization on over 300 text files (exposure data). All the files have the same column's name ( Bacteria, chr, bp,rsID, ref.allele, eff.allele, beta, SE,Pvalue ). I have kept all the txt files in a folder.

How can I load the txt files in the folder and perform two sample mendelian randomization for each of the txt file using one Rstudio code. I tried to use for-loop. However, don't know how to replace the filename in for-loop. My ultimate goal is to perform two sample mendelian randomization analysis on each of the txt files and get the result of each analysis.

setwd("D:/Project")                                                           
data_files <- list.files("p less than 5e-8")            # Identify file names in the folder "p less than 5e-8'
data_files                                                            # print all the files 

library(TwoSampleMR)
ao <- available_outcomes()
for(i in 1:length(data_files)) {                              # Head of for-loop
  exposure_dat <- read_exposure_data(
    filename = data_files[i],
    sep = '\t',
    snp_col = 'rsID',
    beta_col = 'beta',
    se_col = 'SE',
    effect_allele_col = 'eff.allele',
    phenotype_col = '',
    units_col = '',
    other_allele_col = 'ref.allele',
    eaf_col = '',
    samplesize_col = '',
    ncase_col = '',
    ncontrol_col = '',
    gene_col = '',
    pval_col = 'P.value'
  )
  exposure_dat <- clump_data(exposure_dat)
  exposure_dat
  outcome_dat <- extract_outcome_data(exposure_dat$SNP, c('ieu-a-1058'), proxies = 1, rsq = 0.8, align_alleles = 1, 
  palindromes = 1, maf_threshold = 0.3)
  dat <- harmonise_data(exposure_dat, outcome_dat, action = 2)
  mr_results <- mr(dat)
  mr_results

The above is my for loop r script. However, this does not work.

If I input the file one by one using the following script.

filename = '2_family.Bifidobacteriaceae.txt', is the code that input the file for analysis. It works perfect.

library(TwoSampleMR)
ao <- available_outcomes()
exposure_dat <- read_exposure_data(
   filename = '2_family.Bifidobacteriaceae.txt',
   sep = '\t',
   snp_col = 'rsID',
   beta_col = 'beta',
   se_col = 'SE',
   effect_allele_col = 'eff.allele',
   phenotype_col = '',
   units_col = '',
   other_allele_col = 'ref.allele',
   eaf_col = '',
   samplesize_col = '',
   ncase_col = '',
   ncontrol_col = '',
   gene_col = '',
   pval_col = 'Pvalue'
)
exposure_dat <- clump_data(exposure_dat)
outcome_dat <- extract_outcome_data(exposure_dat$SNP, c('ieu-a-1058'), proxies = 1, rsq = 0.8, align_alleles = 1, 
palindromes = 1, maf_threshold = 0.3)
dat <- harmonise_data(exposure_dat, outcome_dat, action = 2)
mr_results <- mr(dat)

2 sample MR Multiple txt files • 1.7k views

ADD COMMENT • link 4.0 years ago by mscpythonstudy ▴ 20

score 0 · Answer 1 · 2021-03-03

0

Entering edit mode

4.0 years ago

Sam ★ 4.8k

One way to do it will be use files <- list.files(pattern="csv$") which will stores all file name into the files object

You can then do a for loop with

for(i in files){
    exposure_dat <- read_exposure_data( filename = i, .....)
    .......
}

ADD COMMENT • link 4.0 years ago by Sam ★ 4.8k

1

Entering edit mode

Vectorization is better than loops.

Use lapply(files, read_exposure_data, sep="\t", snp_col='rsID',...)

ADD REPLY • link 4.0 years ago by Ram 44k

0

Entering edit mode

Thank you, Sam. I tried your method. I think it should work. However, under the package of TwoSampleMR, a message of 'Please look at vignettes for options on running this locally if you need to run many instances of this command.' is popped out.

ADD REPLY • link 4.0 years ago by mscpythonstudy ▴ 20

0

Entering edit mode

Thank you Ram. Agree lapply will be better and faster. Usage of read_exposure_data( filename, sep = "\t ", ...... ) Trying to work it out using lapply

ADD REPLY • link 4.0 years ago by mscpythonstudy ▴ 20

0

Entering edit mode

It's not intuitive, but you can pass all arguments that you'd pass to read_exposure_data directly to lapply instead. lapply will pass them to each call of read_exposure_data. Example:

lapply(file, read.table, header=TRUE, sep="\t", stringsAsFactors=FALSE)

ADD REPLY • link 4.0 years ago by Ram 44k