I am running 325 vcftools commands to generate Fst values, which obviously needs to be automated.
An example:
vcftools --vcf big.vcf --weir-fst-pop pop_lists/pop1.txt --weir-fst-pop pop_lists/pop2.txt --out weir_fst_results/pop1_vs_pop2
and when I run this job, it works fine when I run it one by one by the command line, i.e. there are two files:
pop1_vs_pop2.log
and
pop1_vs_pop2.weir.fst
I need to get the Fst values from the log file.
The problem is that the log files are not generated when I run from my perl script, even though the commands executed are identical.
This sounds so trivial, but if it's not producing the log file, I'm wasting my time.
How can I get vcftools to produce the log file when executing from a script? why isn't vcftools producing the log file?
Perhaps there is a way of calculating this that doesn't involve using vcftools?
the script to run them:
sub execute {
my $command = shift;
print "Executing: $command\n";
if (system($command) != 0) {
say "$command failed.";
die
}
}
my @pop = list_regex_files('^[A-Y]{3}\.txt$', 'ethnicity_lists');
my $outdir = 'weir_fst_results';
if (not -d $outdir) {
mkdir $outdir
}
my $vcf = 'big.vcf';
if (not -e $vcf) {
say "$vcf doesn't exist.";
die
}
foreach my $pop1 (0..$#pop) {
my $label1;
if ($pop[$pop1] =~ m/^ethnicity_lists\/([A-Y]{3})\.txt$/) {
$label1 = $1
} else {
say "$pop[$pop1] failed regex.";
die;
}
foreach my $pop2 (($pop1+1)..$#pop) {
my $label2;
if ($pop[$pop2] =~ m/^ethnicity_lists\/([A-Y]{3})\.txt$/) {
$label2 = $1
} else {
say "$pop[$pop2] failed regex.";
die;
}
my $out_prefix = "weir_fst_results/$label1" . "_vs_$label2";
my $cmd = "vcftools --vcf $vcf ";
$cmd .= "--weir-fst-pop $pop[$pop1] --weir-fst-pop $pop[$pop2] ";
$cmd .= " --out $out_prefix > $out_prefix.out";
execute($cmd);
}
}
If I try to execute each comparison in a separate directory, with a separate nohup
command, unfortunately all command output is still routed to the source nohup.out
, and then the log file is empty, which defeats the purpose of running this.
Can you show us the relevant section of your perl script that runs these lines?
@RamRS I've updated the question
Try
echo
instead ofexecute
and check the command. Also, look at the manual page forexecute
just to make sure STDIN and STDOUT are not being interfered with (not that it makes a difference when theout
is explicitly specified, but just in case)Have you checked if
list_regex_files()
does work properly?What is stored in
@pop
after running line?
@zubenel
list_regex_files
just contains a list of files, nothing elseA good way of debugging this is to print out the
$cmd
variable. I guess that it will be different from what you expect.the problem is that the exact same command works differently when executed from the perl script. The output file is absent (and thus it is pointless to run vcftools at all). Is there a different program that calculates Fst? Vcftools does not work very well.
because the output file is absent, but the .fst file is present, is there a way that I can get the weighted score from the fst file?
Try using
system()
instead ofexecute()
.unfortunately, that didn't change anything. Is vcftools really the only tool available that can calculate Fst? This problem sounds so trivial, but it's useless without that log file.
I'm not really sure. Do Fst files contain F-statistics like this one? If so, you can use plink.
It is definitely odd that a command that works from the shell does not work from a
system()
call. Can you try running the command from a non-interactive shell?If the above command fails, then we will know the reason the perl call and the shell command work differently - that the shell command relies on some sort of alias or profile setting.
I thought that it might be my perlbrew installation, but that doesn't appear to be the case. The first cycle of parallel runs writes the log files, after that, no more log files, and no explanation. The commands are run identically. Looks like I have to give those other programs a try. Thanks for your help
"No explanation" could be because STDERR is being suppressed by perl. Try executing the command from perl but this time, suffix the command with
>cmd.log 2>&1
. This will capture STDERR and STDOUT to thecmd.log
file, which will reveal any problems.I'm running a test now, it should finish in about a day I think (on a small data set)
the problem is that at some point vcftools stops making the output file, and switches the important info to STDERR. I've never seen another program do this before.
That's odd, but then this behavior should have been replicated when you ran it from the shell as well. Why do you think that did not happen?
Some programs might try to determine if they are running in an interactive shell or not. Under some circumstances, Perl runs programs via execvp instead of using the shell, see https://perldoc.perl.org/functions/system.html This should only happen in case there are no shell metacharacters. You have ">" but to force a shell, you might try "/bin/sh -c " to force an additional layer, just in case the log file generation depends on how the program is invoked.
Hello!
Right now i´m having the same trouble using vcftools in a Nextflow pipeline. Did you find an answer?
If your question is more NextFlow related then you may want to ask it in their slack.