Hello, Is there any ways to extract specific variants from a dbNSFP file by providing the chromosome location at the command line?
Hello, Is there any ways to extract specific variants from a dbNSFP file by providing the chromosome location at the command line?
Just to elaborate a little bit Pierre's answer...
I agree that tabix would be the fastest way to do it, but considering that dbNSFP comes in a single zip file containing chromosome gzipped (not bgzipped) files, the best way to do it would be to unzip the particular chromosome you're interested in, bgzip it, tabix index it, and then query it with tabix.
Say you're interested in position 21:5011803, then you should go for something like this:
unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz \
| gunzip | bgzip > dbNSFP4.1a_variant.chr21.gz
tabix -b2 -e2 -S1 dbNSFP4.1a_variant.chr21.gz
tabix dbNSFP4.1a_variant.chr21.gz 21:5011803-5011803
You could even go for a simple grep if you don't want to generate any intermediate files:
unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz | zcat | head -1 > result.tab
unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz | zgrep -P "^21\t5011803\t" >> result.tab
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you very much, this is useful!