Popen Blat has different behavior than same syntax in shell
1
1
Entering edit mode
8.2 years ago
nchuang ▴ 260

If I type this in my shell: blat /myseq/hg19.fasta nonref.fa -out=blast8 blat.blast8

I get a valid output. However, if I try to automate this in python 3.5: proc=Popen(["blat",human_ref,"nonref.fa","-out=blast8","blat.blast8"],stdout=PIPE,universal_newlines=True)

It will run and load the entire genome but it won't read the query file. This is the output I captured:

>>> Loaded 3101804739 letters in 84 sequences
>>> Searched 0 bases in 0 sequences

I have tried with shell=True and also submitting the cmd as a single string but they do not work as well.

Update: For some reason by removing stdout=PIPE and the universal_lines arguments it now works. I first noticed from looking at old stackoverflow posts on this issue someone used subprocess.call. I noticed that it is a wrapper for Popen. I tried call and it worked even giving me stdout automatically without me having to capture it. I did the same thing with Popen and it worked as well. It seems the culprit was the stdout=PIPE argument. I really don't understand it but I'm just glad it works at this point.

Final Update Thanks for the help John. Seems like I was an idiot and didn't close the file handle right before telling blat to read the file. My final testing of the code shows that you can have stdout=PIPE and it will work as well. Thanks again for trying!

python blat • 2.7k views
ADD COMMENT
0
Entering edit mode

Just a total guess since i've never run blat from python, but i suspect it knows you're not running blat in a terminal and is expecting the nonref.fa from the stdin.

First I would try with shell=True and a static string, so just: proc = subprocess.Popen('blat /myseq/hg19.fasta nonref.fa -out=blast8 blat.blast8', shell=True) and nothing else. It will print to the stdout/err which is fine for testing.

Then try it again with stdout=subprocess.PIPE. If that changes things then the issue is fixable but i won't type it out unless that really is the reason.

ADD REPLY
1
Entering edit mode

The static string runs my script and then I see the correct output with correct input given. When I add the stdout=PIPE with universal_newlines=True, it gives me the same 0 bases in 0 sequences.

ADD REPLY
0
Entering edit mode

Sorry to be a pain, but can you remove the universal_newlines=True so we can tell if it's just the stdout=PIPE? I think that's the culprit but I don't want to make a mistake.

ADD REPLY
0
Entering edit mode

I had to add that so I could capture the output as a string:

for line in iter(proc.stdout.readline,''):
    print(">>> "+line.rstrip())

if I added b'' it still gave me byte conversion to str errors. I was also trying to abide by not using shell=True in their recommendations. Let me try your suggestions and get back to you. Thank you!

edit: I did not realize universal_newlines was for the input! I thought it was to receive the output as string.

ADD REPLY
0
Entering edit mode
8.2 years ago
John 13k

I'm off home, so I will type this under the assumption that universal_newlines=True does not effect anything. It might, however, since using that will feed blat "/n" instead of "/r/n" or whatever the line endings originally were in nonref.fa, which if blat is specifically looking for might make it think there is only 1 sequence and it's invalid. Potentially. I don't have any of this software installed so i'm shooting from the hip.

Assuming it's literally just the fact that you're subprocessing blat and not running it in a standard terminal, I invite you to try running in the terminal:

blat /myseq/hg19.fasta nonref.fa -out=blast8 blat.blast8 > this_is_what_python_gets_via_PIPE

I have a suspicion that in the file this_is_what_python_gets_via_PIPE, you will see the same result as what you saw in Python. If that is the case there are 2 work arounds. You can either try and fool blat into thinking you really are attached to a terminal (probably a bad idea) with ptys, or you can give blat the input data on the stdin as I imagine thats what it wants. For example by rewriting the command as:

cat nonref.fa | blat /myseq/hg19.fasta -out=blast8 blat.blast8 > this_is_what_python_gets_via_PIPE

You could also get python to feed blat the file over the stdin, but now the universal_newlines thing becomes an issue. If the above gives you what you need when run via the command line, then you can just paste it into the python in place of your original command and call it a day.

ADD COMMENT
1
Entering edit mode

I updated the OP. Seems like it was stdout that was making it run funny. For whatever reason.

ADD REPLY
0
Entering edit mode

Glad you got it working mate :)

ADD REPLY
0
Entering edit mode

So far I have learned that Blat, at least through the terminal, does not accept stdin.

I also removed the universal_newlines flag and I still get an empty output file. I do notice I can't capture the output if I don't use universal_newlines for whatever reason (I get a never ending loop of >>> b'').

ADD REPLY

Login before adding your answer.

Traffic: 1854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6