running a perl script multiple times; input and output files are required
1
0
Entering edit mode
7.4 years ago
Ana ▴ 200

Hi everyone, I want to run a perl script for multiple input files. I have searched a lot, but could not find any relevant solution for that. The script requires 2 input files and 1 output file; here is the first few lines

#!/usr/bin/perl
use warnings;
use strict;
unless (@ARGV == 3) {die;}
my $in = $ARGV[0]; #merged bsnp table loci 
my $pop = $ARGV[1];# list of population with ind \t pop
my $out = $ARGV[2]; #name of out file

The first input file is my snp_table, second input file is my population file, and the output file should be an empty file. When I have only 1 snp_table, simply I use this command to run the script:

./my_script.pl  snp.tab.table  population.txt  results.out.txt

where snp.tab.table and population.txt are input files and results.out.txt is the empty output file.

The problem that I am struggling with, is that now I want to "run this script in parallel for 500 SNP_table files" and store the out put of each file in a separate output file (the popukation.txt is the same for all SNP files). I am trying something like this:

#!/bin/bash

    for file in snp_table_files/*
    do
        ./script.pl “$file” population.txt "outfiles/$(basename "$file” .txt).out”
    done

in the command above, I have stored all of the SNP_tables in snp_table_files/ directory and created an empty directory for output files outfiles/. , but my code does not work. Could you please help me with that? any help or suggested is appreciated!

perl parallelization • 5.1k views
ADD COMMENT
0
Entering edit mode

What is the error you get? Empty files?

When you say in parallel, do you mean in parallel? Because your code runs them sequentially.

ADD REPLY
0
Entering edit mode

parallel or sequentially does not really matter, I want to run this script for 500 files. I do not get any error message by runnign this code but nothing happens. I find my output directory empty.. Do you have any idea how can do this job?

ADD REPLY
0
Entering edit mode

With so many "quotes" involved, I'd debug the code as BASH commandLine parsing is full with quirks. So,

  1. Put only one snp-file in snp_table_files folder.
  2. modify my_script.pl so that it prints the values of all $ARGV[0] - $ARGV[2]
  3. Check that the output is properly redirected in my_script.pl
  4. Run the above bash script to debug it. Does it produce right names for all the ARGVs?

PS: I am almost sure it's a parsing issue. Your

./script.pl “$file” population.txt "outfiles/$(basename "$file” .txt).out”

line is may be producing more than three ARGVs and since you have used unless (@ARGV == 3) {die;}, the program is exiting without doing anything.

ADD REPLY
0
Entering edit mode

Yeah I'm certain your output file specification is probably breaking something. Try something like:

 for file in snp_table_files/* ; do
     perl ./script.pl "$file" population.txt ./outfiles/"${file%.*}".txt.out
done
ADD REPLY
2
Entering edit mode
7.4 years ago

create a Makefile:

TABLE=$(shell find snp_table_files -type f )

define run

outfiles/$(notdir $(1)).txt : $(1) population.txt
     ./script.pl $$< $$(word 2,$$^) $$@

endef

all: $(addsuffix .txt,$(addprefix outfiles/,$(notdir ${TABLE})))

$(eval $(foreach T,${TABLE},$(call run,$T)))

invoke make with the number of parallel tasks:

make -j 16
ADD COMMENT
0
Entering edit mode

Thanks Pierre, I have some questions though

find snp_table_files , so you mean I should give the path to my genotype files ? and what about output files? the script requires output files as well

ADD REPLY
0
Entering edit mode

so you mean I should give the path to my genotype files

put whatever you want / you need;

you can also write the paths by hand

TABLE= dir1/dir2/file1.txt \
     dir1/dir1/file1.txt \
     dir1/dir2/file2.txt \
     dir1/dir3/file4.txt

the script requires output files as well

the output file is generated. Just run

make -n

to see what would happen (dry run)

ADD REPLY

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6