Entering edit mode
2.4 years ago
Manuel
▴
50
I am analysing the average depth of a bam file I show the full code below.
When I used in the terminal this
awk '{ total += $3; count++ } END { print total/count }' AVERAGE_DEPTH_GOOD_CHR_MORE_THAN_20.txt
This gave me 131.3
However when I put this in a subprocess
command = "awk '{ total += $3; count++ } END { print total/count }' AVERAGE_DEPTH_GOOD_CHR_MORE_THAN_20.txt"
subprocess.Popen(command, shell=True)
I got this awk: cmd. line:1: fatal: division by zero attempted
Why??
The input file in the same in both cases
This is the full script
#!/usr/bin/env python3
import subprocess
import sys
subprocess.Popen('module load samtools', shell=True)
# Take bam file to analysis
#bam = input()
bam = '220706_M00321_0141_000000000-KFRDV_validation8_WRGL4_1190863240.bam'
print(f'Analysing Bam file {bam}')
# Limite the bam file to ROI taken tyhe bed file used
def run(cmd) :
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
proc.communicate()
cmd = f"samtools view {bam} --target-file ../WRGL4_hg19_nonstandard-expanded_v04.bed -bo ROI.bam"
run(cmd)
print('BAM file limited to ROI created')
print('Getting average depth of all bases')
def run(coverage) :
proc = subprocess.Popen(coverage, shell=True, stdout=subprocess.PIPE)
proc.communicate()
coverage = f"samtools depth ROI.bam > AVERAGE_DEPTH.txt"
run(coverage)
print('AVERAGE_DEPTH.txt created')
print('Removing unknown chromosomes')
keep = ('1','2','3','4','5','6','7','8','9','X','Y')
with open('AVERAGE_DEPTH.txt', 'r') as inputFile, open('AVERAGE_DEPTH_ONLY_GOOD_CHR.txt', 'w') as outputFile:
for line in inputFile.readlines():
if line.startswith (keep):
outputFile.writelines(line)
print('AVERAGE_DEPTH_ONLY_GOOD_CHR.txt created')
subprocess.Popen("awk '$3 >= 21' AVERAGE_DEPTH_ONLY_GOOD_CHR.txt > AVERAGE_DEPTH_GOOD_CHR_MORE_THAN_20.txt", shell=True, stdout=subprocess.PIPE)
print('AVERAGE_DEPTH_GOOD_CHR_MORE_THAN_20.txt created')
print('\n Average depth is:')
command = "awk '{ total += $3; count++ } END { print total/count }' AVERAGE_DEPTH_GOOD_CHR_MORE_THAN_20.txt"
subprocess.Popen(command, shell=True)
That error seems to imply an empty file, and your command works fine with a made-up test file for me. Are you sure you tested it with the same file contents your script is working with? If that's not it, maybe try with a simplified awk command for troubleshooting, like:
That will just print the first 10 lines, so, like running
head
. (But again, I expect your actual file is empty so this wouldn't show you anything.)The file is not empty because I have delete every file create in previous run, I have run the script again and then I have done
I have introduce the new line you proposed in your comment and the error is even more strange. I show you the full output
Thanks for your time by the way
That is even weirder. I'm not sure what's going on there.
Maybe just run the problematic awk itself within Python, to narrow it down? You can explicitly split the arguments yourself as a list before calling the process rather than invoking a separate shell instance, which the Python docs claim is supposedly necessary when there are arguments:
Even though giving extra arguments in a string does seem to work as far as I can tell... but I always use a list, myself. I think that will make the situation simpler:
But you could even drop awk altogether, really... you could take the average of your third column with few lines of plain Python and not need a subprocess at all.