I have 44 .tsv files in one folder and I want to calculate the number of intersect of each pairwise with intersect command of bedtools tool. each output file would have 4 coloums and I just need to save only sum of value of coloumn 4 in each output file. I can do it easily when I do it by one one but when I use parallel processing to do the whole process at the same time I get syntax error
here is the code and result when I try each two pairs by one one manually:
$ bedtools intersect -a p1.tsv -b p2.tsv -c
chr1 1 5 1
chr1 8 12 1
chr1 18 20 1
chr1 21 25 0
bedtools intersect -a p1.tsv -b p2.tsv -c | awk '{sum+=$4} END {print sum}
3
here is the code and error when I am using parallel processing:
$ parallel "bedtools intersect -a {1} -b {2} -c |awk '{sum+=$4} END {print sum}'> {1}.{2}.intersect" ::: `ls *.tsv` ::: `ls *.tsv`
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
whith double quotes,
$
is interpreted as a SHELL variable . It must be escaped: https://unix.stackexchange.com/questions/162476I found using
GNU parallel
andawk
tricky because of having to escape quotes and such. How many cores do you have on your computer? If it is only two, then I would just go with a non-parallelbash
for
loop.