Why doesn't grep work when used on a gzipped (.gz) VCF file
3
1
Entering edit mode
2.1 years ago
FRANCESCA ▴ 10

Hi,

I'm using this command on my terminal computer (MacBook Air) but not like this because it doesn't work ...

cat 20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chrX.recalibrated_variants.vcf.gz | grep -w -f chrX.posi.txt > chrx.com.txt

Do you have any suggestions?

Thank youuuu!

Filter • 3.4k views
ADD COMMENT
2
Entering edit mode

Welcome. 1) Please make the title more informative so people with the same issue can find your question and answer and 2) "doesn't work" isn't a very helpful way of describing a problem - be more specific and why what exactly doesn't work and how, and you will get a better answer. Thanks.

ADD REPLY
0
Entering edit mode

One can linuxify macOS to use GNU versions: https://github.com/darksonic37/linuxify

ADD REPLY
0
Entering edit mode

Uhhhh, why would I do that???

ADD REPLY
0
Entering edit mode

probably to make commands work the same way ...

simple commands like sed or even something as simple as ls work differently on the BSD-derived Mac OS

the differences are subtle enough to make debugging quite difficult

ADD REPLY
0
Entering edit mode

These sorts of differences plus annoyances with compilation on different platforms is why I use Ubuntu-based containers for almost everything these days, but this is a different discussion...

ADD REPLY
0
Entering edit mode

Sorry, I was just kidding. It is most important to know about these subtle differences and why they exist, but I sort of grew up with this. If these BSD tools come as part of Apple OS I would personally never replace them, possibly install them in addition in my path. There might be some side effects, though rather unlikely.

ADD REPLY
3
Entering edit mode
2.1 years ago
4galaxy77 2.9k

You are trying to cat a gzipped file - that isn't going to work - you need to use gzip -cd. In general don't use unix tools like grep for filtering a vcf if you don't know how they work. Better to use bcftools instead.

ADD COMMENT
1
Entering edit mode

BSD's (also called Apple gzip) zcat as in MacOS (unlike GNU zcat) looks for .Z files (compressed with zip). So on mac this will not work as it would on linux

% zcat debug.txt.gz
zcat: can't stat: debug.txt.gz (debug.txt.gz.Z): No such file or directory
zcat --version
Apple gzip 321.100.11

Indeed, there is some BSD logic to it: zcat => .Z files, gzcat => .gz files, bzcat => .bz2 files (while gzcat and zcat are both part of Apple gzip....)

ADD REPLY
0
Entering edit mode

I would simply use gzip -cd, it does the same, so decompressing and sending to stdout, and you never run into this BSD nonsense trouble :)

ADD REPLY
0
Entering edit mode

I would say the BSD way is more logical than the GNU way in this case, as often it is cleaner IMHO. Still, for bioinformatics, linux won, while for desktops BSD is ahead (only through MacOS)

ADD REPLY
0
Entering edit mode

For personal use I do not really care, though if you write pipelines or workflows it is utterly annoying that you cannot natively run this on either macOS or Linux due to this differences...and this is where containerization comes into play. Luckily, macOS supports Docker, and be it via a Linux VM rather than natively.

ADD REPLY
2
Entering edit mode
2.1 years ago
Michael 55k

It's good practice to give the actual error message and not just it doesn't work. However, in this case it is clear that you have to use gzcat (or possibly bzcat) instead of cat because you are working on a compressed file.

ADD COMMENT
0
Entering edit mode

Thank you and I'm sorry :D

ADD REPLY
1
Entering edit mode
2.1 years ago
seidel 11k

Compressed files are in a binary format. It's tempting to decompress them so you can read them, but you can do this "on the fly" without modifying the original file, and then pipe the result to grep. As others have suggested, gzcat is a tool which decompresses a .gz file and dumps the results to stdout, without modifying the original file. Another way to do the same is: gunzip -c filename.gz | grep pattern. It's sort of natural to think of gzip and gunzip as a pair for compressing and decompressing files, but gunzip -c decompresses nondestructively so you can read the file in place and do things with the content without altering the file.

ADD COMMENT

Login before adding your answer.

Traffic: 1438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6