why does this pipe work
4
2
Entering edit mode
7.6 years ago
nkinney06 ▴ 140

In looking for a way to check if a bamfile is truncated I noticed that this does the job very quickly.

samtools view file.bam | 1>/dev/null

The thing is I typed the pipe by accident; the command doesn't make sense to me but when you remove the pipe the command takes much longer (Im not sure it ever finishes). My question is why does this work/what is going on.

software error bash • 3.0k views
ADD COMMENT
2
Entering edit mode
7.6 years ago

When you run without the pipe "|" symbol

samtools view file.bam 1>/dev/null

1 is interpreted as stdout so it is exactly the command we all are very used to:

samtools view file.bam > /dev/null

You know it. This is a way to read bam file without header into a SAM file if you replace "/dev/null" with "your.SAM". It takes very long time. /dev/null just discards all the data that was put into it, put it actualy first has to get it from the samtools. The whole file is read despite this /dev/null anyway.

When you add the "|" pipe symbol the way you mentioned:

samtools view file.bam | 1>/dev/null

Your first command tries to read the whole bam file without the header and pipe it (kind of sam file) via stdout into a second command, in your case it is "1". Most likely you do not have program or script called "1" in your system. So the second command will end immediately with an error and as it is the last command in your pipe before redirect it will terminate the whole pipe. What you see on your screen is what the first command had time to write as an error message into stderr since stdout was redirected with pipe. When bam file is truncated stderr contains information that file was truncated when it is a valid bam file stderr is empty so no message. You can test all this by running this:

samtools view file.bam | biostars

since you most likely do not have a program called "biostars" in your system it will behave almost as your "samtools view file.bam | 1>/dev/null" other than it will clearly tell you that there was no "biostars" command found. Then you can run command

samtools view file.bam 2 > err.txt

terminate it immediately with Control+C since it will try to print the whole sam file into your stdout. Now take a look at the contents of the err.txt. It will be empty for proper bam and will have an error message for truncated one. This means that samtools already raised an error before you terminated it manually a moment after it started to read a bam file. So if you do

samtools view file.bam 2 > err.txt | wow

you will get information whether bam is normal or truncated in err.txt file if you do not have program called "wow" in your system =)

ADD COMMENT
3
Entering edit mode
samtools view file.bam | 1>/dev/null
  

Your first command tries to read the whole bam file without the header and pipe it (kind of sam file) via stdout into a second command, in your case it is "1". Most likely you do not have program or script called "1" in your system. So the second command will end immediately with an error

Actually not: only if a space is inserted (unlike in the original example), 1 will be taken as a command name.

$ samtools view file.bam | 1>/dev/null
$ samtools view file.bam | 1 >/dev/null
-bash: 1: command not found
ADD REPLY
0
Entering edit mode

This is an interesting point. The output is different in apearance, but my understanding was that metacharacters | > space tab < ; & ( ) are all parsed by bash to separate words so 1 will be treated as a command in both cases. Probably we need a person that better understands bash to explain why stderr is printed in one situation but not the other and if 1 is not a command if there is no space between 1 and >.

ADD REPLY
0
Entering edit mode

That would be a good question for UNIX StackExchange. I tried

echo "Hello" 1>tmp #cat tmp yields "Hello"
echo "Hello" 2>tmp >&2 #cat tmp yields "Hello", but this is from stderr
echo "Hello" | 1>tmp #cat tmp yields nothing
echo "Hello" | 2>tmp #cat tmp yields nothing
echo "Hello" >&2 | 2>tmp #"Hello" is printed to stderr (console), cat tmp yields nothing since pipe only uses stdout

Nowhere do I see an error, so I'm sure 1> is not being taken as a command anywhere.

ADD REPLY
2
Entering edit mode

It's because the stuff after the | isn't being fed to a command, or rather, it's being fed to a null command, the output of which (which is nothing) is sent to tmp. So the 1>tmp bit is working exactly as you expect, it's just that there's an invisible null command between the | and the 1. "true" and "false" are also both null commands, so you can replicate with them:

echo "Hello" | false 1>tmp #cat tmp yields nothing
ADD REPLY
1
Entering edit mode

The command being executed is equivalent to :, as I understand. Bash continues to amaze me.

ADD REPLY
2
Entering edit mode
7.6 years ago
Charles Plessy ★ 2.9k

Actually, your command does not work: if you would replace /dev/null by a file name, you would see that the resulting file is empty when the pipe symbol is present.

This said, I do not understand why it is not a syntax error.

ADD COMMENT
2
Entering edit mode

Why is it not a syntax error? Because >/dev/null is a valid shell expression in itself. The way the bash parser is implemented it seems to accept

shell-exp -> shell-exp [ | shell-exp]
shell-exp -> redirection   # e.g. >/dev/null
shell-exp -> variable-exp # e.g. $VAR
...
ADD REPLY
0
Entering edit mode

Thank ! Here is also an interesting link to an answer in StackOverflow, posted by Alex Reynolds in an answer that mysteriously disapeared from this discussion.

ADD REPLY
0
Entering edit mode

If you want a shell that is more strict and throws an error in this case, try csh:

% echo hi | > /dev/null
 Invalid null command.
ADD REPLY
0
Entering edit mode

Why would someone switch to csh from any Bourne-flavor shell?

ADD REPLY
2
Entering edit mode
7.6 years ago
John 13k

To do this without using samtools, you can do something like:

Also note, this method of detecting file integrity, which is ultimately what a truncated file is, is bad. One should validate the BAM fully, and then produce a checksum. Before anything meaningful is calculated from the BAM file, the checksum should be regenerated and compared. Many tools I imagine will put an EOF marker on the end of a prematurely-made BAM file. Two files concatenated together will also have the correct EOF marker. For all intents and purposes, EOF markers are a dumb way to infer the integrity of a file.

ADD COMMENT

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6