Question

Extracting variants from dbNSFP

0

Entering edit mode

4.2 years ago

NGSCanBioinf ▴ 10

Hello, Is there any ways to extract specific variants from a dbNSFP file by providing the chromosome location at the command line?

annotation • 1.5k views

ADD COMMENT • link updated 4.2 years ago by Jorge Amigo 14k • written 4.2 years ago by NGSCanBioinf ▴ 10

score 1 · Answer 1 · 2020-11-03

1

Entering edit mode

4.2 years ago

Pierre Lindenbaum 165k

tabix .

ADD COMMENT • link 4.2 years ago by Pierre Lindenbaum 165k

score 1 · Answer 2 · 2020-11-03

Just to elaborate a little bit Pierre's answer...

I agree that tabix would be the fastest way to do it, but considering that dbNSFP comes in a single zip file containing chromosome gzipped (not bgzipped) files, the best way to do it would be to unzip the particular chromosome you're interested in, bgzip it, tabix index it, and then query it with tabix.

Say you're interested in position 21:5011803, then you should go for something like this:

unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz \
| gunzip | bgzip > dbNSFP4.1a_variant.chr21.gz
tabix -b2 -e2 -S1 dbNSFP4.1a_variant.chr21.gz
tabix dbNSFP4.1a_variant.chr21.gz 21:5011803-5011803

You could even go for a simple grep if you don't want to generate any intermediate files:

unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz | zcat | head -1 > result.tab
unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz | zgrep -P "^21\t5011803\t" >> result.tab